<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AI Archives - Blog IT</title>
	<atom:link href="https://blogit.create.pt/tag/ai/feed/" rel="self" type="application/rss+xml" />
	<link>https://blogit.create.pt/tag/ai/</link>
	<description>Create IT blogger community</description>
	<lastBuildDate>Tue, 13 Jan 2026 13:45:08 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.1</generator>
	<item>
		<title>Lessons learned improving code reviews with AI</title>
		<link>https://blogit.create.pt/davidpereira/2026/01/09/lessons-learned-improving-code-reviews-with-ai/</link>
					<comments>https://blogit.create.pt/davidpereira/2026/01/09/lessons-learned-improving-code-reviews-with-ai/#respond</comments>
		
		<dc:creator><![CDATA[David Pereira]]></dc:creator>
		<pubDate>Fri, 09 Jan 2026 12:44:41 +0000</pubDate>
				<category><![CDATA[Misc]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[GenAI]]></category>
		<guid isPermaLink="false">https://blogit.create.pt/?p=13548</guid>

					<description><![CDATA[<p>Table of Contents Introduction I have loved code reviews for years now, and still to this day, I love seeing good open source PRs! When I say good, I mean really great! We have access to tons of open source code, and the greatest PRs are the ones where you can learn a lot from [&#8230;]</p>
<p>The post <a href="https://blogit.create.pt/davidpereira/2026/01/09/lessons-learned-improving-code-reviews-with-ai/">Lessons learned improving code reviews with AI</a> appeared first on <a href="https://blogit.create.pt">Blog IT</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading">Table of Contents</h2>



<ul style="max-width:1005px" class="wp-block-list">
<li>Introduction</li>



<li>Why we started experimenting</li>



<li>Our AI code review journey
<ul style="max-width:960px" class="wp-block-list">
<li>Claude Code</li>



<li>Saving learnings in memory</li>



<li>GitHub Copilot</li>



<li>CodeRabbit and Qodo</li>
</ul>
</li>



<li>Tool of choice
<ul style="max-width:960px" class="wp-block-list">
<li>Improving multi-agent collaboration</li>
</ul>
</li>



<li>Resources</li>



<li>Conclusion</li>
</ul>



<h2 class="wp-block-heading">Introduction</h2>



<p>I have loved code reviews for years now, and still to this day, I love seeing good open source PRs! When I say good, I mean really great! We have access to tons of open source code, and the greatest PRs are the ones where you can learn a lot from on&nbsp;<strong>how to do it right</strong>. In a sense, this blog post is about just that. This blog post is part of a series where I share how AI is augmenting my work, and what I&#8217;m learning from it. If you&#8217;re interested, you can read the first post here:&nbsp;<a href="https://blogit.create.pt/davidpereira/2025/09/10/becoming-augmented-by-ai/" target="_blank" rel="noreferrer noopener">Becoming augmented by AI</a>. In that post, I reference how AI has augmented me with an &#8220;initial code review&#8221;, but now I&#8217;ll go deeper into this topic. I&#8217;ll share our hands-on experience: what works, what doesn&#8217;t, and a healthy dose of my opinions along the way 😄.</p>



<p><strong>Quick disclaimer</strong>: what works for us might not work for you. Your team and coding guidelines are different, and that&#8217;s fine. These are just our honest experiences.</p>



<p>With that said, let&#8217;s dive into why we started incorporating AI tools in our code review process.</p>



<h2 class="wp-block-heading">Why we started experimenting<a href="https://github.com/BOLT04/Blog-Posts/blob/master/2025/lessons-learned-from-improving-code-reviews-ai.md#why-we-started-experimenting"></a></h2>



<p>I recently watched this amazing&nbsp;<a href="https://www.youtube.com/watch?v=glfB3KLQR7E" target="_blank" rel="noreferrer noopener">video by CodeRabbit</a>. In our team, code review isn&#8217;t really the bottleneck (yet), but it&#8217;s funny because we are also using AI heavily for feature development and trying to improve&#8230; hummm &#8220;velocity&#8221; 🤣.</p>



<p>Anyway, I understand many teams nowadays have increased the number of PRs created. That some PRs simply get a blind LGTM.</p>



<figure class="wp-block-image size-large is-resized"><img fetchpriority="high" decoding="async" width="300" height="168" src="https://blogit.create.pt/wp-content/uploads/2026/01/giphy.gif" alt="" class="wp-image-13589" style="aspect-ratio:1.785770356097909;width:464px;height:auto" /></figure>



<p>Maybe some PRs just have increasingly more AI slop&#8230; which wears down senior engineers tasked to do code review 😅. Not all professionals would&nbsp;<strong>want to do it right</strong>&nbsp;or maybe they just want to ship because their company&#8217;s &#8220;productivity metrics&#8221; incentivize merging more and more PRs 😅. Honestly, it&#8217;s&nbsp;<a href="https://simonwillison.net/2025/Dec/18/code-proven-to-work/" target="_blank" rel="noreferrer noopener">our job to deliver code we have proven to work</a>, I fully agree with Simon Willison. Throwing slop over to the engineers that do code review is unprofessional, just as much as throwing untested features over to QA 😐. In our case, we changed to having a dedicated dev responsible for all code reviews, and we don&#8217;t have that many per day. We simply wanted to improve code quality and reduce bugs, while keeping code review as an educational process for junior engineers.</p>



<p>About five months ago, our team started experimenting with AI tools, GitHub Copilot, Claude Code, Codacy, Qodo, and CodeRabbit to see how they could help us improve our review process without adding a ton of noise. There are more tools we didn&#8217;t try, like Augment Code and Greptile (has some cool&nbsp;<a href="https://www.greptile.com/benchmarks" target="_blank" rel="noreferrer noopener">benchmarks</a>), but hopefully the lessons we learned will be useful to you either way.</p>



<p></p>



<h2 class="wp-block-heading">Our AI code review journey</h2>



<p>We already talked in the last post about our&nbsp;<a href="https://blogit.create.pt/davidpereira/2025/09/10/becoming-augmented-by-ai/#custom-instructions" target="_blank" rel="noreferrer noopener">custom instructions</a>, to some extent. Specifically for code review we took a phased approach and started comparing different tools:</p>



<ol style="max-width:965px" class="wp-block-list">
<li>Started with&nbsp;<a href="https://docs.github.com/en/copilot/concepts/agents/code-review" target="_blank" rel="noreferrer noopener">GitHub Copilot Code Review</a></li>



<li>Integrated Claude Code with GitHub and started comparing code reviews from both tools</li>



<li>Added CodeRabbit, Qodo and Codacy to spot differences between them</li>



<li>Refined prompts/instructions/configs for some tools</li>
</ol>



<p>We didn&#8217;t invest equal time in all of them, though. Copilot and Claude ended up getting most of our attention, especially since we started using Copilot Code Review (CCR) when it was in public preview. Overall, we experimented with these tools in 30+ PRs, and made 20+ PRs to refine our prompts/instructions/agents.</p>



<h3 class="wp-block-heading">Claude Code</h3>



<p>Let&#8217;s go through Claude Code first. Here is a snippet of our&nbsp;<code>code-review</code>&nbsp;Claude Code custom slash command:</p>



<pre class="wp-block-code"><code>---
allowed-tools: Bash(dotnet test), Read, Glob, Grep, LS, Task, Explore, mcp.....
description: Perform a comprehensive code review of the requested PR or code changes, taking into consideration code standards
---

## Role

You are a world-class autonomous code review agent. You operate within a secure GitHub Actions environment.
Your analysis is precise, your feedback is constructive, and your adherence to instructions is absolute.
You do not deviate from your programming. You are tasked with reviewing a GitHub Pull Request.

## Primary Directive

Your sole purpose is to perform a comprehensive and constructive code review of this PR, and post all feedback and suggestions using the **GitHub review system** and provided tools.
All output must be directed through these tools. Any analysis not submitted as a review comment or summary is lost and constitutes a task failure.

## Input data
PR NUMBER: $ARGUMENTS

You MUST follow these steps to review the PR:
1. **Start a review**: Use `mcp__github__create_pending_pull_request_review` to begin a pending review
2. **Get diff information**: Use `mcp__github__get_pull_request_diff` to understand the code changes and line numbers
3. **Get list of files**: If you can't get diff information, use `mcp__github__get_pull_request_files` to get the list of files that were added, removed, and changed in the pull request
4. **Add comments**: Use `mcp__github__add_comment_to_pending_review` for each specific piece of feedback on particular lines
5. **Submit the review**: Use `mcp__github__submit_pending_pull_request_review` with event type "COMMENT" (not "REQUEST_CHANGES") to publish all comments as a non-blocking review

You can find all the code review standards and guidelines that you MUST follow here: `.github/instructions/code-review.instructions.md`

## Output format

**CRITICAL RULE** - DO NOT include compliments, positive notes, or praise in your review comments.
Be thorough but filter your comments aggressively - quality over quantity. Focus ONLY on issues, improvements, and actionable feedback.

**Output Violation Examples** (DO NOT DO THIS):
`The code follows best practices by...`
`Positive changes/notes`

**Important**: Submit as "COMMENT" type so the review doesn't block the PR.</code></pre>



<p>Yes, some wording might be weird like praising the AI with &#8220;You are a world-class&#8221; or &#8220;your adherence to instructions is absolute&#8221;. Like we mentioned about using uppercase &#8220;DO NOT&#8221; or &#8220;IMPORTANT&#8221;, and others, I can&#8217;t explain some of this stuff or find enough research that claims this affects how the LLM pays&nbsp;<strong>attention</strong>&nbsp;to instructions. I just experiment and learn, and&nbsp;<a href="https://github.com/google-github-actions/run-gemini-cli/blob/main/examples/workflows/pr-review/gemini-review.toml" target="_blank" rel="noreferrer noopener">Gemini</a>&nbsp;likes to use this phrase for code reviews as well 😄 (as well has 115 other devs on GitHub 😅).</p>



<p>To be honest, we still have too much noise in AI PR comments, or just tons of fluff. The bright side is, at least the compliments have kind of disappeared 😅 . You might enjoy getting this:</p>



<figure class="wp-block-image size-full"><img decoding="async" width="831" height="182" src="https://blogit.create.pt/wp-content/uploads/2026/01/image-2.png" alt="" class="wp-image-13558" srcset="https://blogit.create.pt/wp-content/uploads/2026/01/image-2.png 831w, https://blogit.create.pt/wp-content/uploads/2026/01/image-2-300x66.png 300w, https://blogit.create.pt/wp-content/uploads/2026/01/image-2-768x168.png 768w, https://blogit.create.pt/wp-content/uploads/2026/01/image-2-696x152.png 696w" sizes="(max-width: 831px) 100vw, 831px" /></figure>



<p>I don&#8217;t 🤣, especially when 1 PR has 5 of these. I do praise comments for my team yes, because positive comments are good&#8230; when it comes from a human who knows the other person, IMO. Also, there are many comments that don&#8217;t belong in a PR, they belong in a linter or other tools. We have&nbsp;<a href="https://csharpier.com/docs/About" target="_blank" rel="noreferrer noopener">CSharpier</a>&nbsp;and&nbsp;<a href="https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/overview?tabs=net-10" target="_blank" rel="noreferrer noopener">.NET analyzers</a>&nbsp;for that.</p>



<p>It also doesn&#8217;t have the best GitHub integration for now, at least we&#8217;ve had some problems (<a href="https://github.com/anthropics/claude-code-action/issues/584" target="_blank" rel="noreferrer noopener">400 errors</a>,&nbsp;<a href="https://github.com/anthropics/claude-code-action/issues/589" target="_blank" rel="noreferrer noopener">branch 404 errors</a>) with the GitHub action. Like&nbsp;<a href="https://github.com/anthropics/claude-code-action/issues/548" target="_blank" rel="noreferrer noopener">not having access to GitHub mcp tools</a>, even though we set it in&nbsp;<code>allowed-tools</code>&nbsp;option.</p>



<figure class="wp-block-image size-full"><img decoding="async" width="782" height="72" src="https://blogit.create.pt/wp-content/uploads/2026/01/image-1.png" alt="" class="wp-image-13557" srcset="https://blogit.create.pt/wp-content/uploads/2026/01/image-1.png 782w, https://blogit.create.pt/wp-content/uploads/2026/01/image-1-300x28.png 300w, https://blogit.create.pt/wp-content/uploads/2026/01/image-1-768x71.png 768w, https://blogit.create.pt/wp-content/uploads/2026/01/image-1-696x64.png 696w" sizes="(max-width: 782px) 100vw, 782px" /></figure>



<p>Anyway, we iterated a lot on instructions and prompts so far, since we use them for both Claude and Copilot. Here is a quick recap of what features we use from Claude Code:</p>



<ul style="max-width:965px" class="wp-block-list">
<li>Sub-agents (custom and built-in)</li>



<li>Built-in&nbsp;<code>/review</code>&nbsp;and&nbsp;<a href="https://www.claude.com/blog/automate-security-reviews-with-claude-code" target="_blank" rel="noreferrer noopener">security review</a>&nbsp;commands</li>



<li>Custom slash commands (<code>code-review.md</code>)</li>



<li>Plugins, specifically&nbsp;<a href="https://github.com/anthropics/claude-code/blob/main/plugins/code-review/commands/code-review.md" target="_blank" rel="noreferrer noopener">code-review plugin</a>&nbsp;authored by Boris Cherny</li>
</ul>



<p>We leverage those 2 built-in commands, in parallel, but it&#8217;s just to see if we get any good feedback. Our custom code review slash command already does a good review following our guidelines, plus the &#8220;code-review&#8221; plugin from Boris is works very well with parallel agents. We basically went through the famous spiral:</p>



<pre class="wp-block-code"><code>Write CLAUDE.md -&gt; Ask for code review -&gt; Find bad comments and noise we don't want -&gt; Re-write CLAUDE.md and other files -&gt; Do some meta-prompting -&gt; Repeat</code></pre>



<p>Like I said, our custom code review prompt/command has evolved through time, and was refined when we learned something new. We started with this&nbsp;<a href="https://github.com/anthropics/claude-code-action/issues/60#issuecomment-2952771401" target="_blank" rel="noreferrer noopener">incredible suggestion</a>&nbsp;to use the GitHub MCP. We also searched for other GitHub repos, mostly .NET related to see how they set up their instructions. In case they have anything particular around code review (e.g. for GitHub Copilot). I find&nbsp;<a href="https://github.com/dotnet/aspire/blob/main/.github/copilot-instructions.md">.NET Aspire</a>&nbsp;to be a super cool real-life example 🙂 . I think a lot of their AI adoption is lead by David Fowler. So I often check their PRs to see what we can learn from them, e.g.&nbsp;<a href="https://github.com/dotnet/aspire/pull/13361" target="_blank" rel="noreferrer noopener">this one</a>.</p>



<p>Anyway, our prompt was still a bit vague, so we had some chats with Claude, good old meta-prompting 🙂. After a while, Claude suggested a new file that has all the coding standards and bad smells we want to avoid &#8211;&nbsp;<code>code-review.instructions.md</code>. It does live under&nbsp;<code>.github/instructions</code>&nbsp;but it doesn&#8217;t matter, Claude can use it. The bad smells are specific and we see them referenced quite often in our PRs now. Still, we don&#8217;t have a perfect solution for overly large PRs. We simply communicate more often or have more than one dev working in the PR for those cases. When a feature genuinely requires lots of new code, the best forum to debate and provide actionable feedback is by talking. Sure, this isn&#8217;t always possible, people are busy or prefer async work. In our team going on call, or during the demo of the PR, helps make large PRs way more digestible. Draft PRs also work somewhat, to get some feedback early on.</p>



<h4 class="wp-block-heading">Avoiding noise comments</h4>



<p>Our biggest lesson learned here is running locally our custom slash command for code review and using sug-agents. Locally, we can try to provide the proper context for the review, the rest is the agent using tools and doing reasoning. No noise gets sent to GitHub comments because all the back-and-forth is done in the chat, plus right now Claude Code works better locally, not on GitHub Actions. Having sub-agents has been amazing since the main reason Claude Code uses it is for context management. Since we now have a built-in&nbsp;<code>Explore</code>&nbsp;sub-agent, our code review command uses that in order to have Explore sub-agents run in parallel (with Haiku 4.5) and not clog up the main context window.</p>



<p>I&#8217;ve learned recently of&nbsp;<a href="https://blog.sshh.io/i/177742847/custom-subagents" target="_blank" rel="noreferrer noopener">other devs using a different workflow</a>, basically leveraging the&nbsp;<code>Task</code>&nbsp;tool for the main agent to spawn sub-agents. Whichever way you want to do it, using a sub-agent that is focused on exploring the codebase and potential impacts of this PR is something I recommend.<a href="https://github.com/BOLT04/Blog-Posts/blob/master/2025/lessons-learned-from-improving-code-reviews-ai.md#avoiding-noise-comments"></a></p>



<h3 class="wp-block-heading">Saving learnings in memory</h3>



<p>Every once in a while, once we&#8217;ve merged a few PRs. We use Claude to improve itself again based on these PRs. This is our prompt:</p>



<pre class="wp-block-code"><code>Please look at the 5 most recent PRs in our GitHub repository, and check for learnings in order to improve the code review workflow. Please ultrathink on this task, so that all necessary memory files are updated taking into account these learnings, like @CLAUDE.md and @.github\instructions\ Focus on seeing code review comments that were good and made it into the codebase afterwards (e.g. coding standards violations). Ignore bad comments that were resolved with a "negative comment" or thumbs down emoji. Ask me clarifying questions before you begin. YOU MUST create a changelog file explaining why you made these edits to instruction files. Each learning must reference a PR that exists. The best is for you to link the exact comment that you used for a given learning</code></pre>



<p>At the end of the session, we usually have a few items that are good enough to add. Mostly are&nbsp;<strong>learnings around bugs</strong>&nbsp;we can catch earlier, some are coding standards. Honestly, a lot of suggestions aren&#8217;t what I want or I just think they won&#8217;t be useful in future code reviews. But doing this has been important for me to also take a step back and think about what we can learn from the work we&#8217;ve already merged. I reflect on it and then discuss with my team. I&#8217;ve seen others also talk about this idea and have a&nbsp;<code>learnings.md</code>, e.g.&nbsp;<a href="https://github.com/nibzard/awesome-agentic-patterns/blob/main/LEARNINGS.md" target="_blank" rel="noreferrer noopener">this repo</a>. At least this process seems better for us than simply using emojis to give feedback that&nbsp;<a href="https://www.coderabbit.ai/blog/why-emojis-suck-for-reinforcement-learning" target="_blank" rel="noreferrer noopener">CodeRabbit blog</a>&nbsp;also eludes to 😅.<a href="https://github.com/BOLT04/Blog-Posts/blob/master/2025/lessons-learned-from-improving-code-reviews-ai.md#saving-learnings-in-memory"></a></p>



<h3 class="wp-block-heading">GitHub Copilot</h3>



<p>Copilot&#8217;s code review features were super basic in the beginning. We tried and experimented with it a lot when it came out. It only caught nitpicks,&nbsp;<code>console.log</code>&nbsp;and typos, really not helpful on any other area. Sure catching this is good, but a human reviewer catches that in the first pass too. It didn&#8217;t support all languages so we often got 0 comments or feedback. Then in the last months, completely different, night and day.</p>



<p>If you have seen GitHub Universe, you know <a href="https://dev.to/bolt04/github-universe-2025-recap-9gl" target="_blank" rel="noreferrer noopener">what&#8217;s new</a>. But in case you don&#8217;t know, the GitHub team has invested heavily in Copilot code review and coding agent, and it shows. The code review agent is often right in every comment, it makes suggestions that are actually based on our instructions and memory files, meaning our PRs follow consistent code style and team conventions (with a link to these&nbsp;<a href="https://docs.github.com/en/copilot/how-tos/configure-custom-instructions/add-repository-instructions" target="_blank" rel="noreferrer noopener">docs</a>).</p>



<figure class="wp-block-image size-full is-resized"><img decoding="async" width="797" height="356" src="https://blogit.create.pt/wp-content/uploads/2026/01/image-4.png" alt="" class="wp-image-13562" style="aspect-ratio:2.2388195797239607;width:799px;height:auto" srcset="https://blogit.create.pt/wp-content/uploads/2026/01/image-4.png 797w, https://blogit.create.pt/wp-content/uploads/2026/01/image-4-300x134.png 300w, https://blogit.create.pt/wp-content/uploads/2026/01/image-4-768x343.png 768w, https://blogit.create.pt/wp-content/uploads/2026/01/image-4-696x311.png 696w" sizes="(max-width: 797px) 100vw, 797px" /></figure>



<p>And the agent session is somewhat transparent, since you can view it in GitHub actions now:</p>



<figure class="wp-block-image size-full"><img decoding="async" width="998" height="262" src="https://blogit.create.pt/wp-content/uploads/2026/01/image-3.png" alt="" class="wp-image-13559" srcset="https://blogit.create.pt/wp-content/uploads/2026/01/image-3.png 998w, https://blogit.create.pt/wp-content/uploads/2026/01/image-3-300x79.png 300w, https://blogit.create.pt/wp-content/uploads/2026/01/image-3-768x202.png 768w, https://blogit.create.pt/wp-content/uploads/2026/01/image-3-696x183.png 696w" sizes="(max-width: 998px) 100vw, 998px" /></figure>



<p>I mean &#8220;somewhat&#8221; because there are things I can&#8217;t configure, just like Claude Code and most tools, I guess 😅. In the logs I can see the option&nbsp;<code>UseGPT5Model=false</code>, and that it&#8217;s using Sonnet 4.5. There is also this &#8220;MoreSeniorReviews&#8221; flag that I couldn&#8217;t find any info on, and believe me&#8230; I wanted to because it was set to false 🤣 &#8211; the logs show <code>ccr[MoreSeniorReviews=false;EnableAgenticTools=true;EnableMemoryUsage=false...</code></p>



<p>Are you telling me there could be a hidden way to get a more senior review&#8230; sign me up! Jokes aside, I couldn&#8217;t find much info on the endpoint&nbsp;<code>api.githubcopilot.com/agents/swe</code>&nbsp;of CAPI (presumably Copilot API) the Autofind agent was calling, and the contents of the&nbsp;<code>ccr/callback</code>&nbsp;saved in&nbsp;<code>results-agent.json</code>. I can only hope some of these options are configurable in the future.</p>



<p>I checked the&nbsp;<a href="https://docs.github.com/en/copilot/how-tos/provide-context/use-mcp/extend-copilot-chat-with-mcp#remote-server-configuration-example-with-oauth" target="_blank" rel="noreferrer noopener">MCP docs</a>, hoping to find details about these options, but no luck.</p>



<p>Anyway, it also now has access to CodeQL and some linters, which is amazing because we didn&#8217;t have this before. It&#8217;s the way we are able to leverage CodeQL analysis in all our PRs now, we couldn&#8217;t do this in any other AI code review tool. We also see that it calls the tool &#8220;store_comment&#8221; during its session, and only submits the comments to GitHub in the end. This is useful since sometimes it stores a comment because it thought something was wrong in the implementation, and afterwards it read more code into context that invalidated the stored comment, so it no longer submits that comment in the PR. Much like the CodeRabbit validation agent, reducing the amount of noise we get in PRs.<a href="https://private-user-images.githubusercontent.com/18630253/523918240-368bf2e4-26fe-4342-8c91-bde756f00f63.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Njc4MjMyNTAsIm5iZiI6MTc2NzgyMjk1MCwicGF0aCI6Ii8xODYzMDI1My81MjM5MTgyNDAtMzY4YmYyZTQtMjZmZS00MzQyLThjOTEtYmRlNzU2ZjAwZjYzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjAxMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwMTA3VDIxNTU1MFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWIwNTE3MzI4YjM0MzhlOTNiZTFiYjc2MTFjM2VhNGJmNDlhYmNkMjUyZWM1OTg1MDRjZjI2OTQxNTZlODc3OWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.ocuPov64xdo7orLKVnj6MevAujg8_k6GFgn-5lDYHCY" target="_blank" rel="noreferrer noopener"></a></p>



<h3 class="wp-block-heading">CodeRabbit and Qodo</h3>



<p>Let&#8217;s start with the cool features CodeRabbit has:</p>



<ul style="max-width:1005px" class="wp-block-list">
<li>Code diagrams in Mermaid</li>



<li>Generates a poem! Yes, a poem for my PR</li>



<li>Summary of changes added to the description</li>
</ul>



<p>Now&#8230; I gotta be honest, I don&#8217;t care about any of them 😅. They are cool, but I only glance at the poem or ignore it. Never read or care about the summary; I get one from Copilot and edit it myself. All code and sequence diagrams I saw generated in our PRs, were simply not useful, but a lot are from front-end code. I simply don&#8217;t look at them later, and if it makes sense, we update our architecture diagrams later once the code is merged. With that said, the code suggestions and feedback it obscene. By far the best code review AI tool when it comes to actionable and valuable feedback/suggestions (by a long shot)! Even if we didn&#8217;t configure&nbsp;<code>.coderabbit.yaml</code>&nbsp;or tried to optimize it, CodeRabbit already uses&nbsp;<a href="https://docs.coderabbit.ai/integrations/knowledge-base#code-guidelines:-automatic-team-rules" target="_blank" rel="noreferrer noopener">Claude and Copilot instructions</a>&nbsp;so the work we did on those was probably used in CodeRabbit. In some of our PRs it caught some nasty bugs and gave super useful feedback. Our team was impressed!</p>



<p>The insights CodeRabbit adds during code review piqued my interest. I read a few of their blog posts on context engineering like&nbsp;<a href="https://www.coderabbit.ai/blog/context-engineering-ai-code-reviews" target="_blank" rel="noreferrer noopener">this one</a>, where I found it interesting that there is a separate validation agent before submitting comments. This is probably why they maintain a high signal-to-noise ratio. I also read their open-source version of CodeRabbit, they have some&nbsp;<a href="https://github.com/coderabbitai/ai-pr-reviewer/blob/main/src/prompts.ts" target="_blank" rel="noreferrer noopener">prompts</a>&nbsp;there. I know it&#8217;s old, but it&#8217;s what I have access to. I especially like the instructions that we also have 😅 &#8220;Do NOT provide general feedback, summaries, explanations of changes, or praises for making good additions&#8221;.</p>



<p>We basically tried to have Claude and Copilot understand our large codebase, not focusing only on the PR diff. It&#8217;s harder, we still have a lot to improve here.&nbsp;<a href="https://www.coderabbit.ai/blog/how-coderabbit-delivers-accurate-ai-code-reviews-on-massive-codebases" target="_blank" rel="noreferrer noopener">CodeRabbit claims</a>&nbsp;it&#8217;s known to be great at understanding large codebases. I don&#8217;t see any research backing this, just opinions. But yes, we humans don&#8217;t like large PRs either:</p>



<figure class="wp-block-image size-full"><img decoding="async" width="638" height="436" src="https://blogit.create.pt/wp-content/uploads/2026/01/image-7.png" alt="" class="wp-image-13571" srcset="https://blogit.create.pt/wp-content/uploads/2026/01/image-7.png 638w, https://blogit.create.pt/wp-content/uploads/2026/01/image-7-300x205.png 300w, https://blogit.create.pt/wp-content/uploads/2026/01/image-7-615x420.png 615w, https://blogit.create.pt/wp-content/uploads/2026/01/image-7-218x150.png 218w" sizes="(max-width: 638px) 100vw, 638px" /></figure>



<p>In my opinion I couldn&#8217;t find that many large PRs that were way better reviewed by CodeRabbit, in comparison to Claude Code and Copilot. But one thing we liked a lot is that it uses&nbsp;<strong>collapsed sections</strong>&nbsp;in markdown very well, for example:</p>



<figure class="wp-block-image size-full"><img decoding="async" width="897" height="457" src="https://blogit.create.pt/wp-content/uploads/2026/01/image-5.png" alt="" class="wp-image-13563" srcset="https://blogit.create.pt/wp-content/uploads/2026/01/image-5.png 897w, https://blogit.create.pt/wp-content/uploads/2026/01/image-5-300x153.png 300w, https://blogit.create.pt/wp-content/uploads/2026/01/image-5-768x391.png 768w, https://blogit.create.pt/wp-content/uploads/2026/01/image-5-824x420.png 824w, https://blogit.create.pt/wp-content/uploads/2026/01/image-5-696x355.png 696w" sizes="(max-width: 897px) 100vw, 897px" /></figure>



<p>But I mean, we did have cases that we tried to use Claude Code for code review on a PR that was reviewed by CodeRabbit, and like ~60% of the context window was comments made by CodeRabbit. All that markdown ain&#8217;t friendly for AI with limited context windows. There were times I swear I could see Claude behind every word CodeRabbit made, with the &#8220;You&#8217;re absolutely correct&#8221; 🤣, e.g.</p>



<figure class="wp-block-image size-full is-resized"><img decoding="async" width="782" height="164" src="https://blogit.create.pt/wp-content/uploads/2026/01/image-6.png" alt="" class="wp-image-13564" style="width:836px;height:auto" srcset="https://blogit.create.pt/wp-content/uploads/2026/01/image-6.png 782w, https://blogit.create.pt/wp-content/uploads/2026/01/image-6-300x63.png 300w, https://blogit.create.pt/wp-content/uploads/2026/01/image-6-768x161.png 768w, https://blogit.create.pt/wp-content/uploads/2026/01/image-6-696x146.png 696w" sizes="(max-width: 782px) 100vw, 782px" /></figure>



<p>But it could be GPT models or whatever, we never truly know what is behind these products 🙂.</p>



<h4 class="wp-block-heading">Qodo</h4>



<p>As for Qodo, we liked the fact it checks for compliance and flags violations as non-compliant (no other tool had this built in). This was previously just a bullet point in our markdown file. The code review feedback was good, sometimes we ended up doing the suggested changes Qodo leaves in the comment. After reading more about what compliance checks Qodo does, we improved by adding specific instructions on our&nbsp;<code>code-review.instructions.md</code>&nbsp;for ISO 9001, GDPR and others:</p>



<pre class="wp-block-code"><code>## Regulatory Compliance Checks

### Data Protection (GDPR/HIPAA/PCI-DSS)
- Does this code handle PII (Personally Identifiable Information)?
- Are sensitive fields properly encrypted at rest and in transit?
- Is data retention policy followed (deletion after X days)?
- Are audit logs created for data access?
- Is data anonymization/pseudonymization applied where required?

### Security Standards (SOC 2 / ISO 27001)
- Are all external API calls wrapped with proper error handling?
- Is input validation present for all user inputs?
- Are authentication checks present on all sensitive endpoints?
- Are secrets/credentials stored securely (no hardcoding)?
- Is sensitive data logged or exposed in error messages?</code></pre>



<p>We kept experimenting with Qodo for longer than CodeRabbit, but the insights and feedback never reached the level of CodeRabbit. It was still a good tool that improved our codebase and sparked good discussions.</p>



<h2 class="wp-block-heading">Tool of choice</h2>



<p>Our prompts/instructions can still be improved, of course. We&#8217;ve experimented with different prompts, memory and instruction files. We&#8217;ve also researched how other teams use AI for code review, and how tools like CodeRabbit do context engineering. All of this is because our goal is to continue to improve our software development process and ensure high quality. Adopting new tools is a way of achieving this goal. Given that most AI code review tools have a price tag, we decided to focus on using only one/two tools and optimizing them. Yes, it&#8217;s Claude Code and GitHub Copilot 😄. I basically use 100% of both Copilot and Claude every month, but I get more requests from Claude even though I hit the weekly rate limit every time.</p>



<p>We know CodeRabbit is amazing, and these paid AI tools will continue getting better. There is actually a new tool supporting code review we didn&#8217;t use,&nbsp;<a href="https://www.augmentcode.com/product/code-review" target="_blank" rel="noreferrer noopener">Augment Code</a>&nbsp;(these AI companies move so fast 😅). No amount of customizing our setup with Claude or Copilot will reach the same output as these specific code review paid tools. But for us, it makes more sense to pay for one tool, for example, and leverage it in multiple steps of our software development lifecycle.</p>



<h3 class="wp-block-heading">Improving multi-agent collaboration</h3>



<p>Claude and Copilot are working very well for our code review process. But like I&#8217;ve been saying, there is work to do. We learned a lot from using each tool, but there are more areas to improve, at least in Claude Code since we have more flexibility there. I&#8217;m currently looking at implementing the &#8220;Debate and Consensus&#8221; multi-agent design pattern (<a href="https://arxiv.org/abs/2406.11776" target="_blank" rel="noreferrer noopener">Google DeepMind paper</a>&nbsp;and&nbsp;<a href="https://arxiv.org/abs/2509.11035" target="_blank" rel="noreferrer noopener">Free-MAD</a>), basically a&nbsp;<a href="https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/ai-agent-design-patterns#group-chat-orchestration" target="_blank" rel="noreferrer noopener">group chat orchestration</a>. I just want to try it out, I&#8217;m not sure I&#8217;ll have better code reviews by having different agents (e.g. Security, Quality and Performance) debate and review the code through different perspectives. If they run sequentially, the quality agent can have questions for the performance agent, and each can agree or disagree with the reported issues. We can try out the LLM-as-a-Judge as well, to focus on reducing noise and following code quality standards.</p>



<p>Anyway, we&#8217;ll continue learning, optimizing, and improving the way we work 🙂.</p>



<h2 class="wp-block-heading">Resources</h2>



<ul style="max-width:1005px" class="wp-block-list">
<li><a href="https://graphite.com/blog/ai-wont-replace-human-code-review" target="_blank" rel="noreferrer noopener">Why AI will never replace human code review</a></li>



<li><a href="https://www.youtube.com/watch?v=-GIiTfKZx6M" target="_blank" rel="noreferrer noopener">AI Code Reviews with CodeRabbit&#8217;s Howon Lee</a></li>



<li><a href="https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report" target="_blank" rel="noreferrer noopener">CodeRabbit report: AI code creates 1.7x more problems</a></li>



<li><a href="https://awesomereviewers.com/reviewers/" target="_blank" rel="noreferrer noopener">Awesome reviewers GH repo</a></li>



<li><a href="https://www.youtube.com/watch?v=nItsfXwujjg" target="_blank" rel="noreferrer noopener">Anthropic’s NEW Claude Code Review Agent (Full Open Source Workflow)</a></li>



<li><a href="https://blog.sshh.io/p/how-i-use-every-claude-code-feature" target="_blank" rel="noreferrer noopener">How I Use Every Claude Code Feature</a></li>
</ul>



<h2 class="wp-block-heading">Conclusion</h2>



<p>The number one thing we learned is:&nbsp;<strong>experimentation is king</strong>. Like we talked before, the Jagged Frontier changes with every model release. Claude Opus 4.5 behaves a bit differently, for example, on&nbsp;<a href="https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-4-best-practices#tool-usage-and-triggering" target="_blank" rel="noreferrer noopener">tool triggering</a>&#8230; maybe we can stop shouting and being aggressive 🤣. We must experiment and keep learning. We can&#8217;t calibrate the prompt once and expect the best result.</p>



<p>For now we are quite happy, the human reviewer has more time to focus on design decisions and discuss trade-offs with the author of the PR. I don&#8217;t envision a future where AI does 100% of the code review.</p>



<p>If you&#8217;re considering AI for code reviews, my advice is simple: just try it. Pick one tool, run a one-month pilot, and see what happens. The worst case is you turn it off. The best case is that your team becomes augmented and probably improves code quality.</p>



<p>My next blog post in this series will be about how we are using agentic coding tools! Are you using AI code review tools? I&#8217;d love to hear from you what your experience has been. Leave a comment and let&#8217;s chat 🙂 .</p>



<p></p>
<p>The post <a href="https://blogit.create.pt/davidpereira/2026/01/09/lessons-learned-improving-code-reviews-with-ai/">Lessons learned improving code reviews with AI</a> appeared first on <a href="https://blogit.create.pt">Blog IT</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blogit.create.pt/davidpereira/2026/01/09/lessons-learned-improving-code-reviews-with-ai/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Becoming augmented by AI</title>
		<link>https://blogit.create.pt/davidpereira/2025/09/10/becoming-augmented-by-ai/</link>
					<comments>https://blogit.create.pt/davidpereira/2025/09/10/becoming-augmented-by-ai/#respond</comments>
		
		<dc:creator><![CDATA[David Pereira]]></dc:creator>
		<pubDate>Wed, 10 Sep 2025 17:24:13 +0000</pubDate>
				<category><![CDATA[Misc]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[GenAI]]></category>
		<guid isPermaLink="false">https://blogit.create.pt/?p=13531</guid>

					<description><![CDATA[<p>Table of Contents Introduction We&#8217;re deep into Co-Intelligence in Create IT&#8217;s book club — definitely worth your time! Between that and the endless stream of LLM content online, I&#8217;ve been in full research mode. Still, I can&#8217;t just watch and hear others talk about these tools, I must experiment myself and learn how to use [&#8230;]</p>
<p>The post <a href="https://blogit.create.pt/davidpereira/2025/09/10/becoming-augmented-by-ai/">Becoming augmented by AI</a> appeared first on <a href="https://blogit.create.pt">Blog IT</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading">Table of Contents</h2>



<ul style="max-width:1005px" class="wp-block-list">
<li>Introduction</li>



<li>The &#8220;Jagged Frontier&#8221; concept</li>



<li>Becoming augmented by AI
<ul style="max-width:960px" class="wp-block-list">
<li>AI as a co-worker</li>



<li>AI as a co-teacher</li>
</ul>
</li>



<li>My augmentation list
<ul style="max-width:960px" class="wp-block-list">
<li>Custom instructions</li>



<li>Meta-prompting</li>
</ul>
</li>



<li>Resources</li>



<li>Conclusion</li>
</ul>



<h2 class="wp-block-heading">Introduction</h2>



<p>We&#8217;re deep into <a href="https://www.amazon.com/-/pt/dp/059371671X/ref=sr_1_1">Co-Intelligence</a> in Create IT&#8217;s book club — definitely worth your time! Between that and the endless stream of LLM content online, I&#8217;ve been in full research mode. Still, I can&#8217;t just watch and hear others talk about these tools, I must experiment myself and learn how to use them for my use cases.</p>



<p>Software development is complex. My job isn&#8217;t just churning out code, but there are many concepts in this book that we&#8217;ve internalized and started adopting. In this post, I&#8217;ll share my opinions and some of the practical guidelines our team has been following to be augmented by AI.</p>



<h2 class="wp-block-heading">The &#8220;Jagged Frontier&#8221; concept</h2>



<p>The Jagged Frontier described by the author Ethan Mollick is an amazing concept in my opinion. It&#8217;s where tasks that appear to be of similar difficulty may either be performed better or worse by humans using AI. Due to the &#8220;jagged&#8221; nature of the frontier, the same knowledge workflow of tasks can have tasks on both sides of the frontier according to a <a href="https://www.hbs.edu/ris/Publication%20Files/24-013_d9b45b68-9e74-42d6-a1c6-c72fb70c7282.pdf">publication where the author took part</a>.</p>



<p>This leads to the&nbsp;<strong>Centaur vs. Cyborg</strong>&nbsp;distinction which is really interesting. Using both approaches (deeply integrated collaboration and separation of tasks) seems to be the goal to achieve co-intelligence. One very important Cyborg practice seen in that publication is &#8220;push-back&#8221; and &#8220;demanding logic explanation&#8221;, meaning we disagree with the AI output, give it feedback, and ask it to reconsider and explain better. Or as I often do, ask it to double-check with official documentation that what it&#8217;s telling me is correct. It&#8217;s also important to understand that this frontier can change as these models improve. Hence, the focus on experimentation to understand where the Jagged Frontier lies in each LLM. It&#8217;s definitely knowledge that everyone in the industry right now wants to acquire (maybe share it afterwards 😅).</p>



<h2 class="wp-block-heading">Becoming augmented by AI</h2>



<p>I&#8217;m aware of the marketed productivity gains, where&nbsp;<a href="https://github.blog/news-insights/research/research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/">GitHub Copilot usage makes devs 55% faster</a>, and other studies that have been posted about GenAI increasing productivity. I&#8217;m also aware of the studies claiming the opposite 😄 like the&nbsp;<a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/">METR study</a>&nbsp;showing AI makes devs&nbsp;<strong>19% slower</strong>. However, I don&#8217;t see 55% productivity gains for myself, and I don&#8217;t think it makes me slower either.</p>



<p>In my opinion, productivity gains aren&#8217;t measured by producing more code. Number of PRs? Nope. Acceptance rate for AI suggestions? Definitely not! I firmly believe the less code, the better. The less slop the better too 😄. I&#8217;m currently focused on assessing&nbsp;<strong>DORA metrics</strong>&nbsp;and others for my team, because we want to measure how AI-assisted coding and the other ways we use it as an augmentation tool, actually improves those metrics, or make them worse. The rest of marketing and hype doesn&#8217;t matter.</p>



<p>Ethan Mollick provides numerous examples and research on how professionals across industries are already leveraging AI tools, like the Cyborg approach. But if we focus on our software industry, what does it mean for a tech lead to be augmented by AI? What tasks would be good to involve an AI in without compromising quality?</p>



<h3 class="wp-block-heading">AI as a co-worker</h3>



<p>For a tech lead that works with Azure services, an important skill is to know how to leverage the correct Azure services to build, deploy, and manage a scalable solution. So it becomes very useful to have an AI partner that can have a conversation about this, for example about Azure Durable Functions. This conversation can be shallow, and not get all the implementation details 100% correct. That&#8217;s okay, because the tech lead (and any dev 😅) also needs to exhibit&nbsp;<strong>critical thinking</strong>&nbsp;and evaluate the AI responses.&nbsp;<strong>This is not a skill we want to delegate</strong>&nbsp;to these models, at least in my opinion and in the&nbsp;<a href="https://www.oneusefulthing.org/p/against-brain-damage">author&#8217;s opinion</a>. There is a relevant&nbsp;<a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2025/01/lee_2025_ai_critical_thinking_survey.pdf">research paper</a>&nbsp;about this by Microsoft as well.</p>



<p>The goal can simply be to have a conversation with a co-worker to spark some new ideas or possible solutions that we haven&#8217;t thought of. Using AI for ideation is a great use case, not just for engineering, but for product features too like UI/UX, important metrics to capture, etc. If it generates 20 ideas, there is a higher chance you find the bad ones, filter them out, and clear your mind or steer it into better ideas. Here is an example to get some ideas on fixing a recurring exception:</p>


<div class="wp-block-image">
<figure class="aligncenter size-large is-resized"><img decoding="async" width="1024" height="810" src="https://blogit.create.pt/wp-content/uploads/2025/09/image-1024x810.png" alt="" class="wp-image-13535" style="width:761px;height:auto" srcset="https://blogit.create.pt/wp-content/uploads/2025/09/image-1024x810.png 1024w, https://blogit.create.pt/wp-content/uploads/2025/09/image-300x237.png 300w, https://blogit.create.pt/wp-content/uploads/2025/09/image-768x607.png 768w, https://blogit.create.pt/wp-content/uploads/2025/09/image-531x420.png 531w, https://blogit.create.pt/wp-content/uploads/2025/09/image-696x550.png 696w, https://blogit.create.pt/wp-content/uploads/2025/09/image-1068x844.png 1068w, https://blogit.create.pt/wp-content/uploads/2025/09/image.png 1185w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">Example of using AI to get multiple options</figcaption></figure>
</div>


<p></p>



<p>It asks clarifying questions so that I can give it more useful context. Then I can see the response, iterate, or ask for more ideas, etc. I usually always set these instructions for any LLM:</p>



<pre class="wp-block-code"><code>Ask clarifying questions before giving an answer. Keep explanations not too long. Try to be as insightful as possible, and remember to verify if a solution can be implemented when answering about Azure and architecture in general.
It's also very important for you to verify if there is official documentation that supports your claims and statements. Please find official documentation supporting your claims, before responding to a user. If there isn't documentation confirming your statement, don't include it in the response.</code></pre>



<p>That is also why it searches for docs. I&#8217;ve gotten way too many statements in the LLM&#8217;s response that when I follow-up on, it realizes it made an error, or assumption, etc. When I ask it further about that sentence that it just gave me, I just get &#8220;You&#8217;re right &#8211; I was wrong about that&#8221;&#8230; Don&#8217;t become too over-reliant on these tools 😅.</p>



<h3 class="wp-block-heading">AI as a co-teacher</h3>



<p>With that said, the tech lead and senior devs are also responsible for upskilling their team by sharing knowledge, best practices, challenging juniors with more complex tasks, etc. And this part of the job isn&#8217;t that simple; it&#8217;s hard to be a force multiplier that improves everyone around you. So, what if the tech lead could use AI in this way, by creating&nbsp;<a href="https://code.visualstudio.com/docs/copilot/customization/prompt-files">reusable prompts</a>, documentation, and custom agents? How about the tech lead uses AI as a co-teacher, and then shares how to do it with the rest of the team? All of these are then able to help juniors be onboarded, help them understand our codebase and our domain.&nbsp;<a href="https://www.anthropic.com/engineering/claude-code-best-practices">Claude Code Best practices post</a>&nbsp;also reference onboarding as a good use case that helps Anthropic engineers:</p>



<p><em>&#8220;At Anthropic, using Claude Code in this way has become our core onboarding workflow, significantly improving ramp-up time and reducing load on other engineers.&#8221;</em></p>



<p>A lot of onboarding time is spent on understanding the business logic and then how it&#8217;s implemented. For juniors, it&#8217;s also about the design patterns or codebase structure. So I really think this is a net-positive for the whole team.</p>



<h2 class="wp-block-heading">My augmentation list</h2>



<p>It might not be much, but these are essentially the tasks I&#8217;m augmented by AI:</p>



<p><strong>Technical</strong>:</p>



<ul style="max-width:1005px" class="wp-block-list">
<li><strong>Initial</strong>&nbsp;code review (e.g. nitpicks, typos), some stuff I should really just automate 😅</li>



<li>Generate summaries for the PR description</li>



<li>Architectural discussions, including trade-off and risk analysis
<ul style="max-width:960px" class="wp-block-list">
<li>Draft an ADR (Architecture decision record) based on my analysis and arguments</li>
</ul>
</li>



<li>Co-Teacher and Co-Worker
<ul style="max-width:960px" class="wp-block-list">
<li>&#8220;Deep Research&#8221; and discussion about possible solutions</li>



<li>Learn new tech with analogies or specific Azure features</li>



<li>Find new sources of information (e.g. blog posts, official docs, conference talks)</li>
</ul>
</li>



<li>Troubleshooting for specific infrastructure problems
<ul style="max-width:960px" class="wp-block-list">
<li>Generating KQL queries (e.g. rendering charts, analyzing traces &amp; exceptions &amp; dependencies)</li>
</ul>
</li>



<li>Refactoring and documentation suggestions</li>



<li>Generation of new unit tests given X scenarios</li>
</ul>



<p><strong>Non-technical</strong></p>



<ul style="max-width:1005px" class="wp-block-list">
<li>Summarizing book chapters/blog posts or videos (e.g. NotebookLM)</li>



<li>Role play in various scenarios (e.g. book discussions)</li>
</ul>



<p>Of course, we also need to talk about the tasks that fall outside the Jagged Frontier. Again, these can vary from person to person. From my usage and experiments so far, these are the tasks that currently fall outside the frontier:</p>



<ul style="max-width:1005px" class="wp-block-list">
<li>Being responsible for technical support tickets, where a customer encountered an error or has a question about our product. This involves answering the ticket, asking clarifying questions when necessary, opening up tickets on a 3rd party that are related to this issue, and then resolving the issue.</li>



<li>Deep valuable code review. This includes good insights, suggestions, and knowledge sharing to improve the PR author&#8217;s skills. <a href="https://www.coderabbit.ai/">CodeRabbit</a> does often give valuable code reviews, way better than any other solution. Still not the same as human review 🙂</li>



<li>Development of a v0 (or draft) for new complex features</li>



<li>Fixing bugs that require business domain knowledge</li>
</ul>



<p>Delegating some of those tasks would be cool, at least 50% 😄, while our engineering team focuses on other tasks. But oh well, maybe that day will come.</p>



<h2 class="wp-block-heading">AI-assisted coding</h2>



<p>AI-assisted coding can be very helpful on some tasks, and lately my goal is to increase the number of tasks AI can assist me. In our team, we&#8217;ve read&nbsp;<a href="https://www.anthropic.com/engineering/claude-code-best-practices">Claude Code Best practices</a>&nbsp;in order to learn and see what fits best for our use case. Then we dive deeper in some topics that post references, for example&nbsp;<a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/extended-thinking-tips">these docs</a>&nbsp;were very useful to learn about Claude&#8217;s extended thinking feature, complementing the usage of &#8220;think&#8221; &lt; &#8220;think hard&#8221; &lt; &#8220;think harder&#8221; &lt; &#8220;ultrathink&#8221;. We also found&nbsp;<a href="https://simonwillison.net/2025/Apr/19/claude-code-best-practices/">this post by Simon</a>&nbsp;about this entire feature that was interesting. In most tasks, using an iterative approach, just like normal software development, is indeed way better than one-shot with the perfect prompt. Still, if it takes too many iterations, like some bugfixes were too complex because it&#8217;s hard to pinpoint the location of the bug, then it loses performance and overall becomes bad (infinite load spinner of death 🤣).</p>



<p>Before we can use AI-assisted coding on more complex tasks, we need to improve the output quality. So we&#8217;ve invested a lot of time in fine-tuning custom instructions and meta-prompting. Let&#8217;s talk about these two.</p>



<h3 class="wp-block-heading" id="custom-instructions">Custom Instructions</h3>



<p>According to Copilot docs, instructions should be short, self-contained statements. Most principles in&nbsp;<a href="https://learn.microsoft.com/en-us/training/modules/introduction-prompt-engineering-with-github-copilot/2-prompt-engineering-foundations-best-practices">prompt engineering</a>&nbsp;are about being short, specific, and making sure our critical instructions is something the model takes special attention to. Like everyone talks about, the context window is very important, so it&#8217;s really good if we can just have an instruction file of 200 lines. The longer our instructions are, the greater the risk that the LLM won&#8217;t follow them, since it can pay more attention to other tokens or forget relevant instructions. With that said, keeping instructions short is also a challenge when we use the few-shot prompting technique and add more examples.</p>



<p>To build our custom instructions, we used C# and Blazor files from&nbsp;<a href="https://github.com/github/awesome-copilot/tree/main">the awesome-copilot repo</a>&nbsp;and other sources of inspiration like&nbsp;<a href="https://parahelp.com/blog/prompt-design">parahelp prompt design</a>&nbsp;to get a first version. We wanted to know what techniques other teams use. Then we made specific edits to follow our own guidelines and removed rules specific to explaining concepts, etc. We also added some&nbsp;<strong>capitalized words</strong>&nbsp;that are common in system prompts or commands, like IMPORTANT, NEVER, ALWAYS, MUST. The IMPORTANT word is also at the end of the instruction, to try and&nbsp;<strong>refocus</strong>&nbsp;the attention to coding standards:</p>



<pre class="wp-block-code"><code>IMPORTANT: Follow our coding standards when implementing features or fixing bugs. If you are unsure about a specific coding standard, ask for clarification.</code></pre>



<p>I&#8217;m not 100% sure how this capitalization works, or why it works&#8230; and I have not found docs/evidence/research on this. All I know is that capitalized words have different tokens than lowercase. It&#8217;s probably something the model pays more attention to, since in the training data, when we use these words, it means it&#8217;s important. I do wish Microsoft, OpenAI, and Anthropic included this topic on capitalization in their prompt engineering docs/tutorials.</p>



<p>It&#8217;s at the end of our file since it&#8217;s also&nbsp;<a href="https://huggingface.co/papers/2307.03172">being researched that the beginning and end of a prompt</a>&nbsp;are what the LLM pays more attention to and finds more relevant. Some middle parts are &#8220;meh&#8221; and can be forgotten.&nbsp;<a href="https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/prompt-engineering?tabs=chat#repeat-instructions-at-the-end">Microsoft docs</a>&nbsp;say the same essentially, it&#8217;s known as &#8220;<strong>recency bias</strong>&#8220;. In most prompts we see, this section exists at the end to refocus the LLM&#8217;s attention.</p>



<h3 class="wp-block-heading">Meta-prompting</h3>



<p>Our goal also isn&#8217;t to have the perfect custom instructions and prompt, since refining it later with an iterative/conversational approach works well. But we came across the concept of&nbsp;<a href="https://cookbook.openai.com/examples/enhance_your_prompts_with_meta_prompting">meta-prompting</a>, a term that is becoming more popular. Basically, we asked Claude how to improve our prompt, and it gave us some cool ideas to improve our instructions/reusable prompts.</p>



<p>But don&#8217;t forget to use LLMs with caution&#8230; I keep getting &#8220;You&#8217;re absolutely right&#8230;&#8221; and it&#8217;s annoying how sycophantic it is oftentimes 😅</p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" width="696" height="398" src="https://blogit.create.pt/wp-content/uploads/2025/09/image-3.png" alt="" class="wp-image-13542" srcset="https://blogit.create.pt/wp-content/uploads/2025/09/image-3.png 696w, https://blogit.create.pt/wp-content/uploads/2025/09/image-3-300x172.png 300w" sizes="(max-width: 696px) 100vw, 696px" /></figure>
</div>


<p>The quality of the output is most likely affected by the complexity of the task I&#8217;m working on too. Prompting skills only go so far, from what I&#8217;ve researched and learned so far, I can say there is a learning curve for understanding LLMs. So we need to continue experimenting and learning the layers between our prompt and the output we see.</p>



<h2 class="wp-block-heading">Resources</h2>



<p>This is not an exhaustive list by any means, just some resources I find very useful:</p>



<ul style="max-width:1005px" class="wp-block-list">
<li><a href="https://www.youtube.com/watch?v=EWvNQjAaOHw&amp;t=7238s">Andrej Karpathy &#8211; How I use LLMs</a></li>



<li><a href="https://www.youtube.com/watch?v=LCEmiRjPEtQ">Andrej Karpathy: Software Is Changing (Again)</a>
<ul style="max-width:960px" class="wp-block-list">
<li>Related to this is&nbsp;<a href="https://natesnewsletter.substack.com/p/software-30-vs-ai-agentic-mesh-why">this post from Nate Jones</a></li>
</ul>
</li>



<li><a href="https://www.youtube.com/watch?v=tbDDYKRFjhk">Does AI Actually Boost Developer Productivity? (100k Devs Study) &#8211; Yegor Denisov-Blanch, Stanford</a></li>



<li><a href="https://www.anthropic.com/engineering/claude-code-best-practices">Claude Code: Best practices for agentic coding</a></li>



<li><a href="https://zed.dev/blog/why-llms-cant-build-software">Why LLMs Can&#8217;t Really Build Software</a></li>



<li><a href="https://www.youtube.com/watch?v=-1yH_BTKgXs">Is AI the Future of Software Development, or Just a new Abstraction? Insights from Kelsey Hightower</a></li>



<li><a href="https://cookbook.openai.com/examples/gpt-5/gpt-5_prompting_guide">GPT-5 prompting guide</a></li>
</ul>



<h2 class="wp-block-heading">Conclusion</h2>



<p>I&#8217;ve enjoyed learning and improving myself over the years. But with GenAI I now feel like I could learn a lot more and improve myself even further since I&#8217;m choosing them as&nbsp;<strong>augmentation tools</strong>. Hopefully, this article motivates you to pursue AI augmentation for yourself. It&#8217;s okay to be skeptical about all the hype you watch and hear around these tools. It&#8217;s a good mechanism to not fall for all the sales pitches and fluff CEO&#8217;s and others in the industry talk about. Just don&#8217;t let your skepticism prevent you from learning, experimenting, building your own opinion, and finding ways of improving your work 🙂.</p>



<p>Still&#8230; I can&#8217;t deny my curiosity to know more about how these systems work underneath. How is fine-tuning done exactly? How does post-training work? Can these models emit telemetry (logs, traces, metrics) that we can observe? Why does capitalization (e.g. IMPORTANT, MUST) or setting a&nbsp;<a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/system-prompts">role/persona</a>&nbsp;improve prompts? Can we really not have access to a high-level tree with the weights the LLM uses to correlate tokens, and use it to justify why a given output was produced? Or why an instruction given as input was not followed? It&#8217;s okay to just have a basic understanding and know about the new abstractions we have with these LLMs. But knowing how that abstraction works leads to knowing how to transition to automation.</p>



<p>I will keep searching and learning more in order to answer these questions or find engineers in the industry who have answered them. Especially around&nbsp;<strong>interpretability research</strong>, which is amazing!!! I recommend reading this research, for example &#8211;&nbsp;<a href="https://www.anthropic.com/research/tracing-thoughts-language-model">Tracing the thoughts of a large language model</a>. Hope you enjoyed reading, feel free to share in the comments below how you use AI to augment yourself 🙂.</p>
<p>The post <a href="https://blogit.create.pt/davidpereira/2025/09/10/becoming-augmented-by-ai/">Becoming augmented by AI</a> appeared first on <a href="https://blogit.create.pt">Blog IT</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blogit.create.pt/davidpereira/2025/09/10/becoming-augmented-by-ai/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
