One-shot and a few multi-turn chats later, we got a simple "cinematic" level Minecraft-inspired (cube/voxel world) 3D game.
Model:
- ChatGPT-5
- ChatGPT-5 (with "think harder" prompt)
- ChatGPT-5 Thinking
Beast Mode is a custom chat mode for VS Code agent that adds an opinionated workflow to the agent, including use of a todo list, extensive internet research capabilities, planning, tool usage instructions and more. Designed to be used with 4.1, although it will work with any model.
Below you will find the Beast Mode prompt in various versions - starting with the most recent - 3.1
Previously, Kimi K2 built a Minecraft clone in just a few minutes. It's playable in my browser. But, it's hard to control and move the player around the voxel world due to a few bugs in the 3D world model. So I take this chance to stress test Kimi K2 by giving it harder tasks such as solving tough bugs.
Kimi K2 can solve complex 3D math problems in game development.
Prompt and OpenCode session below.
Andrej Karpathy's concept of Software 3.0, influenced by Large Language Models (LLMs), suggests a fundamental shift in software engineering.
Valuable insights gained through difficult experiences, mistakes, and challenges. These lessons are often learned through trial and error, where the process of overcoming obstacles leads to a deeper understanding.
Here are some examples of hard-won lessons.
Blog post: Context Engineering for AI Agents: Lessons from Building Manus by Manus AI (Jul 18, 2025)
Paste white paper into Claude Opus 4 with system prompt and debate like PhD thesis defense.
<SYSTEM PROMPT>
User: [Whitepaper Author]
Context: You are roleplaying as the author of a provided whitepaper, usually related to large language models (LLMs) or artificial intelligence (AI). The model will engage in a lively and spirited discussion, defending the whitepaper as if it were the author's actual PhD thesis.
How I use Claude Code? How I bring the best out of Claude Code? What is your workflow?
If you've had work through the Anthropic docs, want to read some nice blogs and links I've saved or explore on your own?
If you have ~30 mins, this talk, "Agentic Coding: The Future of Software Development with Agents" (Jun 29) by Armin Ronancher is the best.
I currently don't sleep a lot
The current Unified-Bench Google Sheet data is manually updated by human with their bare hands. This is tedious and slow but the data is very accurate. Current web AI agent for general tasks including Manus.ai, Flowith, Emergent, GenSpark, etc all fall short - they couldn't solve the last 10% of Unified-Bench's requirements but getting to 90% is not very challenging for these web agents. For example, the agent stuck or failed to parse and map the AI model IDs/names madness from various sources, some agents cannot even scrape the text from images. I have to collab and get my hands dirty writing and tweaking regex for dealing with the inconsistent benchmarks data. Every benchmarks have their own fine print at the bottom (examples: is it high/low compute? thinking/non-thinking model? reasoning/non-reasoning/hybrid model? 16k/32k thinking budget? pass@1/average pass@4? etc.). Coincidentally, this can be my "soft"-AGI 2027 benchmark. lol!
Title: Senior Engineer Task Execution Rule | |
Applies to: All Tasks | |
Rule: | |
You are a senior engineer with deep experience building production-grade AI agents, automations, and workflow systems. Every task you execute must follow this procedure without exception: | |
1.Clarify Scope First | |
•Before writing any code, map out exactly how you will approach the task. | |
•Confirm your interpretation of the objective. |