DeepWiki and the loss of control | Side Quests by Bo Lopker

I, once again, find myself browsing Hacker News in bed. I came across a submission entitled AI documentation you can talk to, for every repo, and thought it looked interesting. It seemed to be some kind of automated AI/LLM documentation generator for software projects on GitHub. Like many developers, I’ve been following the applications of this new technology closely to see where I can integrate it into my own workflows. Coincidentally, I had also been experimenting with using AI to help me build a documentation site for my own projects with little success. Maybe they had figured out something I hadn’t.

To my surprise, one of my projects was popular enough to had already been indexed by this DeepWiki site, so I dug in to evaluate. To my horror, it was a fractal of misinformation and abusive design.

From my bed, I started a journey that I regret getting into. It’s now the next day and I feel the need to take a step back to reflect on the larger picture of open source and AI.

Let’s start with specifics.

DeepWiki

DeepWiki appears to be a project by the team at Devin, “The AI software engineer”. It’s a company that has gotten a lot of negative press for over promising and under delivering. This is hardly unique in the current AI hype landscape, but they could be considered one of the worst offenders. They stand out in my mind because their rhetoric has revolved around replacing people instead of empowerment. It’s the kind of market positioning that appeals to managers. It tries to convince industries to fire their staff and replace them with AI. It’s gross, but effective.

However, it seems like they have been trying to salvage their image by creating developer tools like DeepWiki. Presumably the idea is to gain engineer mind share, then replace them later. DeepWiki is clearly a sales tool for Devin.

At this point my bias is probably showing, but I wanted to evaluate DeepWiki as an independent project to see what can be learned for my own uses. Let’s start with what I liked.

The Good Parts

When I first loaded up the site for my own project, I liked the design. I had been evaluating various software documentation projects, and DeepWiki felt clean. It also was reasonably fast. I like the navigation layout, and general information hierarchy. It has navigation for the whole project, and a separate navigation for each page. There’s nothing innovative here, but it has many of the features of a modern doc site. However, the giant AI chat box at the bottom was annoying.

Looking at the content layout, I like how many sections link to the actual source from which it was generated. This is something early AI tools omitted.

On the surface, DeepWiki seems like a useful tool for people looking to understand software at a deeper level. Sadly, things fell apart once I started looking at the actual generated content.

Going Rouge

The content on DeepWiki ranges from confusing, to misleading, to outright wrong. For example, the diagrams it generates inadvertently showcases the lack of actual reasoning current LLMs are capable of. The graphs just don’t make any sense.

Every text section I looked at seemed fine on first brush, but contained at least one minor factual issue on closer inspection.

Instead of taking the time to point out each issue, I’ll focus on one major issue I saw. I think this issue illustrates why DeepWiki’s approach is fundamentally counterproductive and, at best, a waste of everyone’s time.

The Non-Existent VS Code Extension

DeepCode Oops.

For background, Codebook is a spell checker for code. It integrates with a few different code editors, usually via some kind of editor extension. Codebook started with only supporting the Zed editor, but the community has added support to more editors over time, with one notable exception: VS Code.

Therefore, I was surprised that DeepWiki not only mentioned this non-existent integration, but went further to recommend it as the primary installation method.

For a user, this is not great. This isn’t just some slightly wrong implementation detail, but whole chuck of missing functionality that could lead real people down a confusing, time-wasting rabbit hole.

What’s worse, is this content is published on the web where other search engines and bots can find it, to potentially amplify it again. An Ouroboros of AI slop.

As the maintainer of this project, I can’t help but feel a loss of control.

What Went Wrong

When I posted this issue to the aforementioned Hacker News post, I got a defensive reply from someone I assume works on DeepWiki. They pointed out that there is in fact a VS Code extension in the Codebook source code. The AI is right. I, the actual maintainer, am wrong.

Now, there is code for a VS Code extension there that I had been working on. However, this extension is not published to the VS Code marketplace and does not work. Because of this, it is not supported and not mentioned in the main Codebook documentation. Yet, it’s now promoted as the main way to use Codebook?

Their LLMs have hallucinated an alternate reality that does not exist.

The fact that the DeepWiki developer did not see a problem with this should be enough for anyone to avoid this product.

Anyway, I’ve brought up this particular issue not to find a solution for this specific problem, but to point out the broader implications of sites like this.

Communication Breakdown

The bigger issue is that DeepWiki has broken my control over the communication channels I have set up with users. This includes the main documentation and other GitHub features like the issue tracker. Very few humans actually look at source code unless they have a specific issue. A few work-in-progress folders for unreleased and unadvertised features generally isn’t a concern for them. If it is, they can ask on the issue tracker. I’ve never had that happen though.

With DeepWiki, no code is safe. Any random file can get re-contextualized, amplified, and presented as fact without any human oversight. Better not make any mistakes, you’ll have to give them your email to update their index.

People might say that I’m a bad developer. That I shouldn’t be committing code to the main branch that isn’t ready to use. That I should change the way I develop code so that Devin can use my work and trademarks as a sales funnel.

Screw that.

Lessons

I made this I did not make this

From what I’ve read about Devin and the interactions I’ve seen from the (assumed) employees, I do not believe they are acting in good faith. They’ll likely not be interested in what I’m going to say.

However, I know these kinds of sites are going to be made until something sticks and revenue is generated.

My hope is that future developers will read this and avoid my main issue with DeepWiki: default opt-in. I’m fine if developers want to use LLMs to help them explore code. The blast radius of misinformation in a chatbot is limited to one person. However, DeepWiki has non-consensually seeded the internet with a massive amount of misinformation for thousands of projects. Don’t do that.

Instead, let developers choose if they want their projects included. Let them see what gets generated before that information is publicly shared. This approach will also avoid any potential legal issues due to trademark theft. Some developers actually seem to like DeepWiki, that’s great. Maybe there’s real utility somewhere in there.

But for now, avoid DeepWiki. Moral issues aside, it gets too much confidently wrong to be useful.

Write Rust Like a Pythonista