<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AI Archives - Blog IT</title>
	<atom:link href="https://blogit.create.pt/tag/ai/feed/" rel="self" type="application/rss+xml" />
	<link>https://blogit.create.pt/tag/ai/</link>
	<description>Create IT blogger community</description>
	<lastBuildDate>Mon, 15 Jun 2026 08:24:34 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=7.0</generator>
	<item>
		<title>How we use agentic coding tools in our favor &#8211; Copilot</title>
		<link>https://blogit.create.pt/davidpereira/2026/06/15/how-we-use-agentic-coding-tools-in-our-favor-copilot/</link>
					<comments>https://blogit.create.pt/davidpereira/2026/06/15/how-we-use-agentic-coding-tools-in-our-favor-copilot/#respond</comments>
		
		<dc:creator><![CDATA[David Pereira]]></dc:creator>
		<pubDate>Mon, 15 Jun 2026 08:24:31 +0000</pubDate>
				<category><![CDATA[Misc]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[GenAI]]></category>
		<guid isPermaLink="false">https://blogit.create.pt/?p=13621</guid>

					<description><![CDATA[<p>Table of Contents Introduction This blog post is part of a series where I share how AI is augmenting my work, and what I&#8217;m learning from it. If you&#8217;re interested, you can read the second post here:&#160;Lessons learned improving code reviews with AI. In that post, I reference how we are adopting AI for code [&#8230;]</p>
<p>The post <a href="https://blogit.create.pt/davidpereira/2026/06/15/how-we-use-agentic-coding-tools-in-our-favor-copilot/">How we use agentic coding tools in our favor &#8211; Copilot</a> appeared first on <a href="https://blogit.create.pt">Blog IT</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading">Table of Contents</h2>



<ul style="max-width:1005px" class="wp-block-list">
<li>Introduction</li>



<li>Planning the experiment
<ul style="max-width:960px" class="wp-block-list">
<li>Cost considerations</li>
</ul>
</li>



<li>Testing Copilot coding agent
<ul style="max-width:960px" class="wp-block-list">
<li>Code review comments</li>



<li>Improvements mid-way</li>



<li>Our pain points</li>



<li>Performance considerations</li>
</ul>
</li>



<li>Main takeaways</li>



<li>Resources</li>



<li>Conclusion</li>
</ul>



<h2 class="wp-block-heading">Introduction</h2>



<p class="wp-block-paragraph">This blog post is part of a series where I share how AI is augmenting my work, and what I&#8217;m learning from it. If you&#8217;re interested, you can read the second post here:&nbsp;<a href="https://blogit.create.pt/davidpereira/2026/01/09/lessons-learned-improving-code-reviews-with-ai/">Lessons learned improving code reviews with AI</a>. In that post, I reference how we are adopting AI for code reviews. This one is a deep dive into how we experimented with GitHub Copilot coding agent, to see how it could fit our team&#8217;s needs. Truth be told, we are using Claude Code a lot more, it&#8217;s the tool we are focused on and have adopted, but that will be a separate blog post in this series 😆.</p>



<p class="wp-block-paragraph">Our approach was simple: experiment in order to learn what works. We are still learning and improving, but the more we use and optimize these tools, the more leverage we gain as a team. Let&#8217;s get into the details.</p>



<h2 class="wp-block-heading">Planning the experiment</h2>



<p class="wp-block-paragraph">Okay… we all have heard of Copilot coding agent by now. You probably have heard that at GitHub, this agent is the number 1 contributor in their code base at the <a href="https://youtu.be/P6Va0_KILi4?t=410">keynote</a> and <a href="https://github.com/resources/events/github-roadmap-webinar-q1">their roadmap webinar Q1 2026</a></p>



<figure class="wp-block-image size-large"><img fetchpriority="high" decoding="async" width="1024" height="540" src="https://blogit.create.pt/wp-content/uploads/2026/06/508718662-5386183e-8902-43eb-b6af-c48b9d715772-1024x540.png" alt="" class="wp-image-13622" srcset="https://blogit.create.pt/wp-content/uploads/2026/06/508718662-5386183e-8902-43eb-b6af-c48b9d715772-1024x540.png 1024w, https://blogit.create.pt/wp-content/uploads/2026/06/508718662-5386183e-8902-43eb-b6af-c48b9d715772-300x158.png 300w, https://blogit.create.pt/wp-content/uploads/2026/06/508718662-5386183e-8902-43eb-b6af-c48b9d715772-768x405.png 768w, https://blogit.create.pt/wp-content/uploads/2026/06/508718662-5386183e-8902-43eb-b6af-c48b9d715772-796x420.png 796w, https://blogit.create.pt/wp-content/uploads/2026/06/508718662-5386183e-8902-43eb-b6af-c48b9d715772-696x367.png 696w, https://blogit.create.pt/wp-content/uploads/2026/06/508718662-5386183e-8902-43eb-b6af-c48b9d715772-1068x563.png 1068w, https://blogit.create.pt/wp-content/uploads/2026/06/508718662-5386183e-8902-43eb-b6af-c48b9d715772.png 1130w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p class="wp-block-paragraph">Well&#8230; to me this is the same&nbsp;<a href="https://www.linkedin.com/posts/gergelyorosz_exactly-one-year-ago-10-mar-2025-dario-activity-7437104256219959296-nsS2">statement that Dario</a>, CEO of Anthropic, made of &#8220;in 3-6 months AI is writing 90% of the code&#8221;. I&#8217;m glad it works for them, and they can spend their marketing budget and strategy on these slides and statements. But it&#8217;s not the metric I care about. I&#8217;m fine not having to type in a keyboard to write 90% of the code, but measuring LOC just doesn&#8217;t make sense to me. That says nothing about the quality of the merged code, bugs introduced, etc. They are hype-driven statements in my opinion 🤣. Nevertheless, it leaves this question in my mind: could coding agents work effectively and produce high-quality PRs?</p>



<p class="wp-block-paragraph">We have been doing quite a bit of experimenting with GitHub Copilot coding agent and Claude Code, to try to answer this question. Maybe we can replace most of our typing on a keyboard to prompting. Our goal is nothing like GitHub, we have nothing to sell&#8230; they do 😅. Our motivation is to keep improving the way we work and bring value to real customers. So I&#8217;ll share what we have done and experimented with GitHub Copilot coding agent 🙂.</p>



<p class="wp-block-paragraph">We planned this before the&nbsp;<a href="https://github.blog/news-insights/company-news/github-copilot-is-moving-to-usage-based-billing/">change to usage-based billing</a>, since I had still 70% premium tokens left in August, and they reset every month, we were like:</p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" width="498" height="385" src="https://blogit.create.pt/wp-content/uploads/2026/06/evil-laugh.gif" alt="" class="wp-image-13623" /></figure>
</div>


<p class="wp-block-paragraph">Why not spend them all in a ton of coding agent experiments 😄 ! So we did 😆. Our approach was this:</p>



<ul style="max-width:985px" class="wp-block-list">
<li>Pick 1 task that is prioritized for our next release to give to the coding agent</li>



<li>Pick 1 other backlog item or general task we would like to be done, some bugfixes, or code improvements</li>



<li>Assign all tasks to a coding agent (most would be Copilot, others would go to <a href="https://github.blog/changelog/2026-02-04-claude-and-codex-are-now-available-in-public-preview-on-github/">Claude third-party agent</a>)</li>



<li>Go do another task, then after a while, review PRs</li>



<li>Report on the premium token usage + number of PRs + quality of the output + number of comments</li>



<li>Start the cycle again with more tasks on the next month</li>
</ul>



<p class="wp-block-paragraph">Again, our goal is to experiment in order to learn what works. We did this experiment in August 2025, some other months and again in March 2026 (mainly since there were many improvements introduced). It&#8217;s important to note our focus was on Copilot, not any other&nbsp;<a href="https://github.blog/changelog/2026-02-04-claude-and-codex-are-now-available-in-public-preview-on-github/">third-party agent</a>. We did not use Codex, and only used Claude on some of the tasks for this experiment.</p>



<p class="wp-block-paragraph"></p>



<h3 class="wp-block-heading">Cost considerations</h3>



<p class="wp-block-paragraph">We did not analyze or think a lot about costs. The goal was to experiment and see the quality of the PRs on different tasks. But suffice to say, the <a href="https://github.blog/news-insights/company-news/github-copilot-is-moving-to-usage-based-billing/">billing change</a> is a necessary change. We were able to hit the 59min timeout on tasks that should not cost 1 premium request, or get a lot of tool calls for that cost, like:</p>



<figure class="wp-block-table is-style-regular"><table class="has-fixed-layout"><thead><tr><th class="has-text-align-center" data-align="center">Prompt</th><th class="has-text-align-center" data-align="center">Branch changes against master</th><th class="has-text-align-center" data-align="center">Output</th></tr></thead><tbody><tr><td class="has-text-align-center" data-align="center">please check all work that is done here in this branch, vs the master branch, and do a thorough code review using all skills available. Focus on bugs and then code quality too. Use multiple subagents, each with their own perspective and goal</td><td class="has-text-align-center" data-align="center">~47,300 additions and ~11,000 deletions</td><td class="has-text-align-center" data-align="center">Error hitting 59min timeout. Used 4 subagents. We hit the error&nbsp;<code>model_max_prompt_tokens_exceeded</code>&nbsp;with the message &#8220;prompt token count of 530706 exceeds the limit of 64000&#8221;</td></tr><tr><td class="has-text-align-center" data-align="center">improve memory consumption of function X. Acceptance criteria: Memory should not exceed Y, regardless of the amount of items being processed.</td><td class="has-text-align-center" data-align="center">~900 additions and ~40 deletions</td><td class="has-text-align-center" data-align="center">Success in 53min</td></tr></tbody></table></figure>



<p class="wp-block-paragraph">I&#8217;m sure you have seen plenty of engineers using a lot of inference for 1 premium request or for 20 dollars of a Claude Code subscription 😅. Anyway, if we were to adopt Copilot Coding agent in June, we would need more controls to control GitHub actions minutes and overall token usage + re-evaluate the cost-benefit.</p>



<h2 class="wp-block-heading">Testing Copilot coding agent</h2>



<p class="wp-block-paragraph">I&#8217;ll share an approximation of the results we got:</p>



<figure class="wp-block-table is-style-regular has-regular-font-size"><table class="has-fixed-layout"><thead><tr><th>Author</th><th>PRs</th><th>Merged</th><th>Merge rate</th></tr></thead><tbody><tr><td>Copilot coding agent</td><td>~130</td><td>~30</td><td><strong>~23%</strong></td></tr><tr><td>Third-party Claude agent</td><td>~5</td><td>0</td><td><strong>0%</strong></td></tr><tr><td>Human developers</td><td>~400</td><td>~390</td><td><strong>~98%</strong></td></tr></tbody></table></figure>



<figure class="wp-block-table is-style-regular"><table class="has-fixed-layout"><thead><tr><th>Size (lines changed)</th><th class="has-text-align-left" data-align="left">Copilot/Claude PRs</th></tr></thead><tbody><tr><td>S (10-49)</td><td class="has-text-align-left" data-align="left">~12</td></tr><tr><td>M (50-199)</td><td class="has-text-align-left" data-align="left">~48</td></tr><tr><td>L (200-999)</td><td class="has-text-align-left" data-align="left">~55</td></tr><tr><td>XL (1000+)</td><td class="has-text-align-left" data-align="left">~20</td></tr></tbody></table></figure>



<p class="wp-block-paragraph">Again, this is mainly an experiment so merge rate is expected to be very low. A lot of Copilot coding agent PRs were spikes/exploration or using GitHub custom agents to analyze PRDs, do security reviews, etc.</p>



<p class="wp-block-paragraph">Let&#8217;s talk about the quality of the PR when Copilot asked me for a review. First and foremost, I don&#8217;t think we have a high bar for quality PRs compared to other successful software teams. To me, a high-quality PR is expected, always, period. Second of all, many draft PRs I&#8217;ve created and seen other engineers create, are usually a v0. It&#8217;s a version we publish to get feedback from engineers on our team, it&#8217;s never actually ready to be merged. All Copilot PRs are created as drafts, to me this signals Copilot really just did a v0, even if it says it has completed everything and everything works. My current opinion is this is made on purpose, to give you a chance to steer Copilot again in its implementation, and do an initial code review to spot things that are wrong.</p>



<p class="wp-block-paragraph">I&#8217;ve not seen any docs, or official statements from GitHub supporting my claim. With that said, I&#8217;d like an option to enable Copilot to continue iterating on their PR and only ask me for a review when it&#8217;s no longer a draft. But this coding agent might not evolve in that direction since their marketing and docs so far are focused on small &amp; medium tasks. Cost control becomes very important too with long-running agents.</p>



<p class="wp-block-paragraph">To simplify things, we&#8217;ll say asking me for a review is the same as another engineer asking me to review their PR. Any engineer in our team (or generally in the world) only assigns a co-worker a PR for review once the PR is ready, has finished work and they tested and reviewed their own work. From our experiment so far, Copilot was not able to have a ready and polished PR in most PRs, so I need to leave a lot of comments saying the countless wrong parts. One of the problems is the feedback loop, we didn&#8217;t make the Playwright MCP work for us since we have limitations on the front-end login flow. So the agent doesn&#8217;t deliver the necessary code for front-end tasks.</p>



<h3 class="wp-block-heading">Code review comments</h3>



<p class="wp-block-paragraph">In terms of the comments I made, alongside CCR and Claude Code before marking the PR as ready for review, it&#8217;s around 20-30. The ones that got merged usually had under 20 comments and most discussions weren&#8217;t critical or about low-quality code. Again, it appears to me these were mostly low-to-medium tasks that were clearly defined and the agent did well. Our closed PRs and a few of the experiments got 40+ comments, for various reasons:</p>



<ul style="max-width:985px" class="wp-block-list">
<li>Unnecessary test cases</li>



<li>Re-implemention of certain modules and functions was necessary</li>



<li>Removed existing functionality</li>



<li>Low quality code and not adhering to coding standards and best practices (e.g. a lot of duplicated code, missing error handling)</li>



<li>Missing front-end implementation</li>



<li>Usage of non-existent CSS classes</li>
</ul>



<p class="wp-block-paragraph">The ones that got closed and were simple experiments didn&#8217;t receive much review. I understand that isn&#8217;t a great thing for the experiment, but we simply invested more time in some PRs rather than all, again some are just spikes or over the top on purpose.</p>



<p class="wp-block-paragraph">It&#8217;s not the same thing as our own team PRs, of course, since these draft PRs are done in like ~15min. But the number of comments necessary to have these draft PRs ready to be reviewed by another human (and AI tools, like Copilot itself, Claude and CodeRabbit) is important. Since it&#8217;s time I&#8217;m spending reviewing code. I don&#8217;t want to be bothered when there are still typos and acceptance criteria is not fully met 😅.</p>



<h3 class="wp-block-heading">Improvements mid-way<a href="https://github.com/BOLT04/Blog-Posts/blob/master/2026/how-we-use-agentic-coding-tools.md#improvements-mid-way"></a></h3>



<p class="wp-block-paragraph">I had features Copilot coding agent didn&#8217;t do very well, which then prompted me to ask for a way to&nbsp;<a href="https://github.com/copilot-coding-agent/user-feedback/issues/84">ask clarifying questions</a>. The agent dashboard is nowadays a lot better, and we can start with this&nbsp;<a href="https://docs.github.com/en/copilot/tutorials/cloud-agent/get-the-best-results#researching-planning-and-iterating-before-opening-a-pull-request">type of planning</a>&nbsp;and make it ask clarifying questions too. I only experimented with this a few times, mostly because we started to add more context and details in the GitHub issues, and because steering costs more premium requests. This also matches Cursor&#8217;s best practices of&nbsp;<a href="https://cursor.com/blog/agent-best-practices">&#8220;plan before coding&#8221;</a>, a best practice that is mentioned everywhere and by all AI labs for good reason.</p>



<p class="wp-block-paragraph">After some PRs, we would also try to tweak the&nbsp;<code>instructions.md</code>&nbsp;to see if it improves anything. It&#8217;s a bit hard to know for sure if some changes to our prompts/instructions really improve the LLM&#8217;s quality. Just by experimenting and tweaking, can we really see if in the future PRs it works better. We also didn&#8217;t configure&nbsp;<code>copilot-setup-steps.yml</code>. We know the&nbsp;<a href="https://docs.github.com/en/copilot/how-tos/copilot-on-github/customize-copilot/customize-cloud-agent/customize-the-agent-environment#customizing-copilots-development-environment-with-copilot-setup-steps">max timeout for the coding agent</a>&nbsp;is 59 minutes currently. There weren&#8217;t many options we wanted to configure in this file for our experiment.</p>



<p class="wp-block-paragraph">GitHub also shipped the ability for the coding agent to&nbsp;<a href="https://github.blog/ai-and-ml/github-copilot/whats-new-with-github-copilot-coding-agent/#h-pull-requests-that-arrive-in-better-shape">use Copilot code review</a>&nbsp;and it runs CodeQL as well. Which is great, some of our pain points were kind of addressed here, since it prevents some issues from reaching a human reviewer. Still&#8230; we had issues and opinions on the PRs we experimented and saw, so let&#8217;s go through them now 🙂.</p>



<h3 class="wp-block-heading">Our pain points<a href="https://github.com/BOLT04/Blog-Posts/blob/master/2026/how-we-use-agentic-coding-tools.md#our-pain-points"></a></h3>



<h4 class="wp-block-heading">Tests<a href="https://github.com/BOLT04/Blog-Posts/blob/master/2026/how-we-use-agentic-coding-tools.md#tests"></a></h4>



<p class="wp-block-paragraph">There are several times when Copilot didn&#8217;t run all unit tests. Or Copilot says &#8220;tests pass&#8221;, when in fact it didn&#8217;t wait for all tests to finish so it can&#8217;t know if tests pass&#8230; Here is an example of a comment I left Copilot after I reviewed the PR:</p>



<pre class="wp-block-code"><code>"copilot" there are several issues and missing implementation. Please make all the following changes:

## Front end
- **Missing** the entire front-end implementation, please make the necessary changes using the design system and with the acceptance criteria in the GitHub issue

## Testing
- Please please follow coding standards on all methods
- You should include unit tests to your implementation of X
- Delete all assertions of the `exception.Message`, because it's something that can change, and that makes it a fragile test</code></pre>



<p class="wp-block-paragraph">I read some session logs and found interesting things. Sure, I didn&#8217;t specify a lot about what unit tests to run in the prompt, but I&#8217;d actually prefer running all unit tests since we can make changes that break other areas in our codebase. But, in the end, it also didn&#8217;t wait for tests to run and see if everything passes. I don&#8217;t expect to see the wording &#8220;tests pass&#8221; if the agent simply didn&#8217;t wait for them to finish. Honestly, this is not that bad, we can run them ourselves or later in our CI check… but again I want to refine our instructions file in order for coding agents to always follow them and produce better quality PRs. Instruction following depends on the LLM, but there are still improvements here for sure.</p>



<h4 class="wp-block-heading">Doesn&#8217;t follow our PR template<a href="https://github.com/BOLT04/Blog-Posts/blob/master/2026/how-we-use-agentic-coding-tools.md#doesnt-follow-our-pr-template"></a></h4>



<p class="wp-block-paragraph">It just doesn&#8217;t follow our PR template. Sure, it&#8217;s a small thing, maybe a temporary limitation. But I mean in general, whenever Copilot publishes a comment on the PR saying &#8220;Fixed! This is done&#8230;.&#8221;, but then I see that it&#8217;s not done and the PR description is something like this:</p>



<pre class="wp-block-code"><code>## Definition of Done
- &#091;x] PR follows template format 
- &#091;x] Code review comments addressed
- &#091;x] Implementation follows C# coding standards  
- &#091;x] Build warnings fixed
- &#091;x] Core functionality implemented
- &#091; ] Final Application compilation issues resolved
- &#091; ] All tests passing</code></pre>



<p class="wp-block-paragraph">Every time, I&#8217;m like:</p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" width="498" height="413" src="https://blogit.create.pt/wp-content/uploads/2026/06/ugh-no-michael-scott-no.gif" alt="" class="wp-image-13627" /></figure>
</div>


<h4 class="wp-block-heading">Lack of docs on installed tools and observability<a href="https://github.com/BOLT04/Blog-Posts/blob/master/2026/how-we-use-agentic-coding-tools.md#lack-of-docs-on-installed-tools-and-observability"></a></h4>



<p class="wp-block-paragraph">This is a nitpick I know, but since I was reading the session logs I found the copilot coding agent has access to python3. I didn&#8217;t know this was the case from Copilot Coding agent docs, but it makes sense since our GitHub actions runner&nbsp;<a href="https://github.com/actions/runner-images/blob/ubuntu24/20250824.1/images/ubuntu/Ubuntu2404-Readme.md">uses ubuntu</a>. I mean we have the firewall on, but it would be great to know how to deny access to these tools. I&#8217;m also not about to dunk too hard on GitHub about the observability around this feature, because they rely on GitHub actions. We all know what telemetry we can get out of those&#8230; From an engineering perspective, Copilot Coding Agent lacks a lot, I mean a lot, when it comes to observability. No OpenTelemetry, no nothing. It&#8217;s clearly not a priority or concern for them, I can understand that, but I don&#8217;t agree with that decision. Claude Managed Agents has some stuff like tracing, but I guess not a lot of companies have observability as a priority or concern for cloud agents.</p>



<h4 class="wp-block-heading">No reasoning around a simpler solution<a href="https://github.com/BOLT04/Blog-Posts/blob/master/2026/how-we-use-agentic-coding-tools.md#no-reasoning-around-a-simpler-solution"></a></h4>



<p class="wp-block-paragraph">We saw a few PRs where the agent simply jumps and fixates on the first solution, without reasoning about the trade-offs and alternatives there are. We&#8217;ll dive deeper about one scenario concerning performance considerations, but for now I&#8217;ll keep it light. In one task where we assigned Claude agent, the bug we wanted to fix is dead simple. It&#8217;s one function, a&nbsp;<code>string</code>&nbsp;extension method, that is not handling edge cases correctly when parsing class names. The solution in the PR was using&nbsp;<code>StringBuilder</code>&nbsp;and a&nbsp;<code>for</code>&nbsp;loop with some logic to decide how to parse and handle the edge case. It&#8217;s not wrong, but I prefer simpler code. Sure, with an initial prompt that says something like &#8220;don&#8217;t forget to code review your solution at the end&#8221;, perhaps it would have caught and reasoned about if there were simpler solutions, using Regex for example. Maybe the Claude third-party agent can&#8217;t do that, only Copilot coding agent can, no idea though.</p>



<h4 class="wp-block-heading">Reliability problems<a href="https://github.com/BOLT04/Blog-Posts/blob/master/2026/how-we-use-agentic-coding-tools.md#reliability-problems"></a></h4>



<p class="wp-block-paragraph">We experienced errors sometimes, or hit unfortunate limitations or bugs. Of the ~130 Copilot PRs, we got around 30 failed GitHub actions runs. Due to various errors but sometimes I can&#8217;t even know why, for example, when the session fails I can&#8217;t always see the full logs in that job run. The GitHub actions UI only shows &#8220;This job failed&#8221; with the annotation &#8220;Unhandled exception. System.IO.IOException: No space left on device&#8221;. Well&#8230; great, thanks for the info. Couldn&#8217;t you truncate or do something to reliably show me some verbose logs? What contributed most to disk space? Is the agent getting too much output in tool calls that is saved in files on disk? What tools produced the most output tokens? Did the agent make tool calls that are inefficient and wrong? What happened exactly? Not the best UX&#8230; Sure, there are larger GitHub actions runners. But I don&#8217;t want to throw money at a problem I don&#8217;t know the root cause to&#8230;</p>



<p class="wp-block-paragraph">Some sessions we hit the 59min timeout, but I feel like we shouldn&#8217;t. One copilot coding agent session was about code review on a branch with this prompt: &#8220;please check all work that is done here in this branch, vs the master branch and do a thorough code review using all skills available. Focus on bugs and then code quality too. Use multiple subagents each with their own perspective and goal&#8221;. I wasn&#8217;t expecting a 59min run even with 5 subagents, then I saw this on the logs:</p>



<pre class="wp-block-code"><code>20:25:43.4654572Z Start flushing callbacks
20:53:35.0823201Z ::***::
20:53:40.0920377Z ##&#091;error]The operation was canceled.</code></pre>



<p class="wp-block-paragraph">What is this? Why did the actions runner take ~30min to flush callbacks 😅. The code review was done already, I don&#8217;t understand why it failed the whole job, so it&#8217;s a bit frustrating to spend these Actions minutes&#8230;</p>



<p class="wp-block-paragraph">Also, we assigned Copilot to an issue and immediately got a comment saying&nbsp;<em>&#8220;The agent encountered an error and was unable to start working on this issue: This may be caused by a repository ruleset violation. See granting bypass permissions for the agent, or please contact support if the issue persists. (Request id: X).&#8221;</em>&nbsp;Well&#8230; no, i know for a fact it&#8217;s not a ruleset violation or permissions related. I assigned the Claude agent next to this issue and it worked. Just to test it again, I assigned Copilot again after some days to this issue. It started working and made a WIP PR, until it failed with this error:</p>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="213" src="https://blogit.create.pt/wp-content/uploads/2026/06/603845628-08fef871-f57d-4639-b780-7c7f24b74366-1024x213.png" alt="" class="wp-image-13628" srcset="https://blogit.create.pt/wp-content/uploads/2026/06/603845628-08fef871-f57d-4639-b780-7c7f24b74366-1024x213.png 1024w, https://blogit.create.pt/wp-content/uploads/2026/06/603845628-08fef871-f57d-4639-b780-7c7f24b74366-300x62.png 300w, https://blogit.create.pt/wp-content/uploads/2026/06/603845628-08fef871-f57d-4639-b780-7c7f24b74366-768x160.png 768w, https://blogit.create.pt/wp-content/uploads/2026/06/603845628-08fef871-f57d-4639-b780-7c7f24b74366-696x145.png 696w, https://blogit.create.pt/wp-content/uploads/2026/06/603845628-08fef871-f57d-4639-b780-7c7f24b74366-1068x222.png 1068w, https://blogit.create.pt/wp-content/uploads/2026/06/603845628-08fef871-f57d-4639-b780-7c7f24b74366.png 1206w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p class="wp-block-paragraph">Then I see the agent session logs and find these type of errors:</p>



<pre class="wp-block-code"><code>stderr: "fatal: path 'Tests/SuperCoolTests.cs' exists on disk, but not in 'X'\n"
(...)
&#091;cca-engine] Failed to disconnect session X cleanly: Error: Connection is disposed.
&#091;cca-engine] Fatal: Error: Execution failed: CAPIError: 422 422 Unprocessable Entity</code></pre>



<p class="wp-block-paragraph">The fatal error &#8220;exists on disk&#8221; happened 6 times during the code review step with CodeQL. I mean, is it the root cause of the 422 error… maybe not, I don&#8217;t know. Sure, these might be &#8220;nitpicks&#8221; but in a way, it says something about the reliability of this feature. Reliability is something that users can cancel subscriptions for, if they find alternatives that are reliable and fit their needs, for example.</p>



<h3 class="wp-block-heading">Performance considerations<a href="https://github.com/BOLT04/Blog-Posts/blob/master/2026/how-we-use-agentic-coding-tools.md#performance-considerations"></a></h3>



<p class="wp-block-paragraph">One of the more interesting things the Copilot coding agent did, that I didn&#8217;t like, was on one PR to refactor this:</p>



<pre class="wp-block-code"><code>var tasks = items.Select(item =&gt; ProcessItemAsync(item));
await Task.WhenAll(tasks);</code></pre>



<p class="wp-block-paragraph">Into this:</p>



<pre class="wp-block-code"><code>foreach (var item in items)
{
    await ProcessItemAsync(item);
}</code></pre>



<p class="wp-block-paragraph">I read the session log to understand why it did this, because in this case, this is not the best solution. For context, this PR is supposed to fix a concurrency issue due to how we use <code>OpenConnectionAsync</code> in the DbContext. The code throws the exception <code>System.InvalidOperationException: The connection was not closed. The connection's current state is connecting</code>. Also, a single DbContext isn&#8217;t thread-safe, so concurrent calls through <code>Task.WhenAll</code> will <a href="https://learn.microsoft.com/en-us/ef/core/dbcontext-configuration/#avoiding-dbcontext-threading-issues">throw this exception</a>. I don&#8217;t want sequential processing just to fix this concurrency issue. So here is the snippet of the agent&#8217;s internal reasoning in the session logs:</p>



<pre class="wp-block-code"><code><em>"The real problem surfaces in X.cs, where multiple concurrent calls to Y are fired simultaneously for different IDs. Since they all share the same DbContext and connection, this concurrent access causes contention on a single connection that's never properly closed. The fix requires addressing both the connection lifecycle and the concurrency issue. While adding try/finally blocks to close connections in the repository methods would help, the underlying problem is that a single DbContext isn't thread-safe, so concurrent calls through Task.WhenAll are fundamentally problematic. The most practical immediate solution is to ensure connections are properly closed after each repository method completes, which should at least prevent the connection exhaustion that's currently happening. that open connections
2. In X.cs, change from concurrent Task.WhenAll to sequential processing to avoid concurrent use of the same DbContext"</em></code></pre>



<p class="wp-block-paragraph">It just seems it was focused on fixing the concurrency issue, not considering the performance trade-off of this solution. Actually, it does seem some other PRs the coding agent creates has this behaviour as well. Jumping and fixating on a solution, without reasoning about the trade-offs and alternatives there are (this was a Claude Opus 4.6 copilot coding agent session by the way 😅). I mean&#8230; should we have given the context of how big this&nbsp;<code>items</code>&nbsp;list can be in production? Should we ask explicitly to consider the performance of its code? Well, maybe, but I was hoping Opus 4.6 could think like an engineer 😅. Either way, maybe a better feedback loop would help the agent as well, like performance tests that the agent can run after making this fix. That way, the agent could measure the cost of the code it&#8217;s making. I guess that is what we should all do anyway. Improve these feedback loops since they help agents and humans do better work. Sounds obvious, but it could probably really have improved this PR 😅.</p>



<p class="wp-block-paragraph">We ended up refactoring this code in this PR because it doesn&#8217;t even make sense to process this list with a <code>Task.WhenAll(tasks)</code> when we can make a better DB query that is more performant and cleaner.</p>



<h3 class="wp-block-heading">Main takeaways<a href="https://github.com/BOLT04/Blog-Posts/blob/master/2026/how-we-use-agentic-coding-tools.md#main-takeaways"></a></h3>



<p class="wp-block-paragraph">So did Copilot coding agent do a good job? Well, I&#8217;d argue it could have been better, so I&#8217;m curious to see if we can give even better instructions and provide more context. Including some context that can be useful for Copilot directly in the issue description is always a good idea, like relevant files to skip some of the searching and grepping. Also, I acknowledge our feedback loop could be better and something that would help the coding agent for sure. Also&#8230; mister LLM was not running all tests and waiting for them to finish. So they could be failing, but it was all good for the coding agent&#8230; well, not good enough for me 😅. What I really don&#8217;t want is for the coding agent to say in the end &#8220;All tests pass&#8221;, and then I check the full logs and see &#8220;(&#8230;) ok tests are taking too long to build and run. I&#8217;ll proceed with the other tasks.&#8221;</p>



<p class="wp-block-paragraph">The workflow of giving work to the agent, then go do something else entirely, and comeback to review worked well. Especially since the copilot sessions take like ~15min, so I enjoy having the agent work in the background instead of having it on my VS Code, waiting for me to approve commands or provide feedback. If I can steer it in the right direction from the start, it tends to do a decent job for the initial PR. The challenge is reducing the number of iterations in a PR until it&#8217;s considered done. Having them work in the background can increase the feedback loop of: getting code -&gt; reviewing code -&gt; asking for revisions.</p>



<p class="wp-block-paragraph">However, delegating a task to an&nbsp;<strong>autonomous cloud agent</strong>&nbsp;and reviewing big PRs at the end is a fundamentally different workflow from iterative, step-by-step collaboration (e.g. VS Code Agent mode, or CLI). Sure, it&#8217;s cool to delegate some PRs at the end of the day, and then come back tomorrow to review that code. But it&#8217;s not very practical unless the quality of that PR is high or the task is very small in scope. I see a lot of engineers in the industry enjoying cloud agents a lot, but for me, I still prefer coding agents running locally with an iterative back-and-forth collaboration, then create a PR from that (plus I can gather more telemetry locally 🙂). Like Stephen Toub said&nbsp;<a href="https://devblogs.microsoft.com/dotnet/ten-months-with-cca-in-dotnet-runtime/#iteration-is-expected">in his blog post</a>, iteration is expected:</p>



<pre class="wp-block-code"><code><em>If you expect CCA to get it right the first time with zero human involvement, you’ll be disappointed a non-trivial percentage of the time. Expect multiple rounds of review feedback with you providing clear, specific, and actionable feedback.</em></code></pre>



<p class="wp-block-paragraph">I just prefer to do it locally. The one scenario I truly prefer autonomous cloud coding agents is when I&#8217;m on-call doing monitoring and SRE type work. Handling support tickets, checking logs, dashboards, exceptions and possible improvements to our runbooks and overall codebase. I can assign tasks like bug fixing or test coverage gaps to the cloud agent, go back to Grafana. An hour later I come back and assign Copilot Code review and Claude, then go back to monitoring. When reviews are done I tell the cloud agent to fix all bugs or issues. The next day before my SRE type work I can do some code review on a PR that is in a better state, test it myself and go on from there. Cloud coding agents fit the workflow of delegating these tasks and it works well for me.</p>



<p class="wp-block-paragraph">So in short, for the clear well-defined tasks the agent produced a good quality PR that got merged sometimes. For the complex features and bug fixes, that require searching and understanding many files in the codebase, it does a worse job. If the task is complex, it will require more thinking, reading multiple projects and just raw domain knowledge. It still provided value in the PRs we experimented on, since a lot of the tasks we experimented on were indeed medium complexity. Honestly, we have other tools that produce good quality PRs for clear well-defined tasks.</p>



<h2 class="wp-block-heading">Resources<a href="https://github.com/BOLT04/Blog-Posts/blob/master/2026/how-we-use-agentic-coding-tools.md#resources"></a></h2>



<ul style="max-width:985px" class="wp-block-list">
<li><a href="https://devblogs.microsoft.com/dotnet/ten-months-with-cca-in-dotnet-runtime/">Ten Months with Copilot Coding Agent in dotnet/runtime</a></li>



<li><a href="https://visualstudiomagazine.com/articles/2025/07/15/copilot-is-rising-all-time-contributor-to-net-maui-repo.aspx">.NET MAUI team&#8217;s experience with Copilot coding agent</a></li>



<li><a href="https://devblogs.microsoft.com/dotnet/maui-team-copilot-tips/">How the .NET MAUI Team uses GitHub Copilot for Productivity</a></li>



<li><a href="https://simonw.substack.com/p/agentic-engineering-patterns">Simon Willison &#8211; Agentic Engineering Patterns</a></li>



<li><a href="https://cursor.com/blog/agent-best-practices">Best practices for coding with agents</a></li>
</ul>



<h2 class="wp-block-heading">Conclusion<a href="https://github.com/BOLT04/Blog-Posts/blob/master/2026/how-we-use-agentic-coding-tools.md#conclusion"></a></h2>



<p class="wp-block-paragraph">We will keep experimenting a little with GitHub Copilot coding agent or other agentic tools the Copilot subscription supports (e.g. OpenCode, Codex). But it&#8217;s fair to say we&#8217;ll be doubling down on our adoption of Claude Code as our agentic coding tool. Like I&#8217;ve said in the posts of this series, the Jagged Frontier keeps moving and knowing where the task you give these tools falls inside the frontier or not, defines how much you are augmented. If we can get more of the low-medium complexity tasks done right, reliably and ensure quality along the way. I&#8217;m certain we will be very happy and continue working on more complex tasks that provide value to our customers. Since I have seen LLMs lacking the judgement, trade-off analysis and decision making engineers have, I prefer the collaboration I can have from local sessions and not a cloud agent session.</p>



<p class="wp-block-paragraph">Don&#8217;t forget to stay critical and don&#8217;t let yourself be swayed by all this hype. Test things yourself, don&#8217;t over-trust outputs from a tool, come up with your own solutions and adopt what works.</p>



<p class="wp-block-paragraph">My next blog post in this series will also be about agentic coding tools, in this case, Claude Code! Are you using AI coding agents? I&#8217;d love to hear from you what your experience has been. Leave a comment and let&#8217;s chat 🙂 .</p>



<p class="wp-block-paragraph"></p>
<p>The post <a href="https://blogit.create.pt/davidpereira/2026/06/15/how-we-use-agentic-coding-tools-in-our-favor-copilot/">How we use agentic coding tools in our favor &#8211; Copilot</a> appeared first on <a href="https://blogit.create.pt">Blog IT</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blogit.create.pt/davidpereira/2026/06/15/how-we-use-agentic-coding-tools-in-our-favor-copilot/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Lessons learned improving code reviews with AI</title>
		<link>https://blogit.create.pt/davidpereira/2026/01/09/lessons-learned-improving-code-reviews-with-ai/</link>
					<comments>https://blogit.create.pt/davidpereira/2026/01/09/lessons-learned-improving-code-reviews-with-ai/#respond</comments>
		
		<dc:creator><![CDATA[David Pereira]]></dc:creator>
		<pubDate>Fri, 09 Jan 2026 12:44:41 +0000</pubDate>
				<category><![CDATA[Misc]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[GenAI]]></category>
		<guid isPermaLink="false">https://blogit.create.pt/?p=13548</guid>

					<description><![CDATA[<p>Table of Contents Introduction I have loved code reviews for years now, and still to this day, I love seeing good open source PRs! When I say good, I mean really great! We have access to tons of open source code, and the greatest PRs are the ones where you can learn a lot from [&#8230;]</p>
<p>The post <a href="https://blogit.create.pt/davidpereira/2026/01/09/lessons-learned-improving-code-reviews-with-ai/">Lessons learned improving code reviews with AI</a> appeared first on <a href="https://blogit.create.pt">Blog IT</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading">Table of Contents</h2>



<ul style="max-width:1005px" class="wp-block-list">
<li>Introduction</li>



<li>Why we started experimenting</li>



<li>Our AI code review journey
<ul style="max-width:960px" class="wp-block-list">
<li>Claude Code</li>



<li>Saving learnings in memory</li>



<li>GitHub Copilot</li>



<li>CodeRabbit and Qodo</li>
</ul>
</li>



<li>Tool of choice
<ul style="max-width:960px" class="wp-block-list">
<li>Improving multi-agent collaboration</li>
</ul>
</li>



<li>Resources</li>



<li>Conclusion</li>
</ul>



<h2 class="wp-block-heading">Introduction</h2>



<p class="wp-block-paragraph">I have loved code reviews for years now, and still to this day, I love seeing good open source PRs! When I say good, I mean really great! We have access to tons of open source code, and the greatest PRs are the ones where you can learn a lot from on&nbsp;<strong>how to do it right</strong>. In a sense, this blog post is about just that. This blog post is part of a series where I share how AI is augmenting my work, and what I&#8217;m learning from it. If you&#8217;re interested, you can read the first post here:&nbsp;<a href="https://blogit.create.pt/davidpereira/2025/09/10/becoming-augmented-by-ai/" target="_blank" rel="noreferrer noopener">Becoming augmented by AI</a>. In that post, I reference how AI has augmented me with an &#8220;initial code review&#8221;, but now I&#8217;ll go deeper into this topic. I&#8217;ll share our hands-on experience: what works, what doesn&#8217;t, and a healthy dose of my opinions along the way 😄.</p>



<p class="wp-block-paragraph"><strong>Quick disclaimer</strong>: what works for us might not work for you. Your team and coding guidelines are different, and that&#8217;s fine. These are just our honest experiences.</p>



<p class="wp-block-paragraph">With that said, let&#8217;s dive into why we started incorporating AI tools in our code review process.</p>



<h2 class="wp-block-heading">Why we started experimenting<a href="https://github.com/BOLT04/Blog-Posts/blob/master/2025/lessons-learned-from-improving-code-reviews-ai.md#why-we-started-experimenting"></a></h2>



<p class="wp-block-paragraph">I recently watched this amazing&nbsp;<a href="https://www.youtube.com/watch?v=glfB3KLQR7E" target="_blank" rel="noreferrer noopener">video by CodeRabbit</a>. In our team, code review isn&#8217;t really the bottleneck (yet), but it&#8217;s funny because we are also using AI heavily for feature development and trying to improve&#8230; hummm &#8220;velocity&#8221; 🤣.</p>



<p class="wp-block-paragraph">Anyway, I understand many teams nowadays have increased the number of PRs created. That some PRs simply get a blind LGTM.</p>



<figure class="wp-block-image size-large is-resized"><img decoding="async" width="300" height="168" src="https://blogit.create.pt/wp-content/uploads/2026/01/giphy.gif" alt="" class="wp-image-13589" style="aspect-ratio:1.785770356097909;width:464px;height:auto" /></figure>



<p class="wp-block-paragraph">Maybe some PRs just have increasingly more AI slop&#8230; which wears down senior engineers tasked to do code review 😅. Not all professionals would&nbsp;<strong>want to do it right</strong>&nbsp;or maybe they just want to ship because their company&#8217;s &#8220;productivity metrics&#8221; incentivize merging more and more PRs 😅. Honestly, it&#8217;s&nbsp;<a href="https://simonwillison.net/2025/Dec/18/code-proven-to-work/" target="_blank" rel="noreferrer noopener">our job to deliver code we have proven to work</a>, I fully agree with Simon Willison. Throwing slop over to the engineers that do code review is unprofessional, just as much as throwing untested features over to QA 😐. In our case, we changed to having a dedicated dev responsible for all code reviews, and we don&#8217;t have that many per day. We simply wanted to improve code quality and reduce bugs, while keeping code review as an educational process for junior engineers.</p>



<p class="wp-block-paragraph">About five months ago, our team started experimenting with AI tools, GitHub Copilot, Claude Code, Codacy, Qodo, and CodeRabbit to see how they could help us improve our review process without adding a ton of noise. There are more tools we didn&#8217;t try, like Augment Code and Greptile (has some cool&nbsp;<a href="https://www.greptile.com/benchmarks" target="_blank" rel="noreferrer noopener">benchmarks</a>), but hopefully the lessons we learned will be useful to you either way.</p>



<p class="wp-block-paragraph"></p>



<h2 class="wp-block-heading">Our AI code review journey</h2>



<p class="wp-block-paragraph">We already talked in the last post about our&nbsp;<a href="https://blogit.create.pt/davidpereira/2025/09/10/becoming-augmented-by-ai/#custom-instructions" target="_blank" rel="noreferrer noopener">custom instructions</a>, to some extent. Specifically for code review we took a phased approach and started comparing different tools:</p>



<ol style="max-width:965px" class="wp-block-list">
<li>Started with&nbsp;<a href="https://docs.github.com/en/copilot/concepts/agents/code-review" target="_blank" rel="noreferrer noopener">GitHub Copilot Code Review</a></li>



<li>Integrated Claude Code with GitHub and started comparing code reviews from both tools</li>



<li>Added CodeRabbit, Qodo and Codacy to spot differences between them</li>



<li>Refined prompts/instructions/configs for some tools</li>
</ol>



<p class="wp-block-paragraph">We didn&#8217;t invest equal time in all of them, though. Copilot and Claude ended up getting most of our attention, especially since we started using Copilot Code Review (CCR) when it was in public preview. Overall, we experimented with these tools in 30+ PRs, and made 20+ PRs to refine our prompts/instructions/agents.</p>



<h3 class="wp-block-heading">Claude Code</h3>



<p class="wp-block-paragraph">Let&#8217;s go through Claude Code first. Here is a snippet of our&nbsp;<code>code-review</code>&nbsp;Claude Code custom slash command:</p>



<pre class="wp-block-code"><code>---
allowed-tools: Bash(dotnet test), Read, Glob, Grep, LS, Task, Explore, mcp.....
description: Perform a comprehensive code review of the requested PR or code changes, taking into consideration code standards
---

## Role

You are a world-class autonomous code review agent. You operate within a secure GitHub Actions environment.
Your analysis is precise, your feedback is constructive, and your adherence to instructions is absolute.
You do not deviate from your programming. You are tasked with reviewing a GitHub Pull Request.

## Primary Directive

Your sole purpose is to perform a comprehensive and constructive code review of this PR, and post all feedback and suggestions using the **GitHub review system** and provided tools.
All output must be directed through these tools. Any analysis not submitted as a review comment or summary is lost and constitutes a task failure.

## Input data
PR NUMBER: $ARGUMENTS

You MUST follow these steps to review the PR:
1. **Start a review**: Use `mcp__github__create_pending_pull_request_review` to begin a pending review
2. **Get diff information**: Use `mcp__github__get_pull_request_diff` to understand the code changes and line numbers
3. **Get list of files**: If you can't get diff information, use `mcp__github__get_pull_request_files` to get the list of files that were added, removed, and changed in the pull request
4. **Add comments**: Use `mcp__github__add_comment_to_pending_review` for each specific piece of feedback on particular lines
5. **Submit the review**: Use `mcp__github__submit_pending_pull_request_review` with event type "COMMENT" (not "REQUEST_CHANGES") to publish all comments as a non-blocking review

You can find all the code review standards and guidelines that you MUST follow here: `.github/instructions/code-review.instructions.md`

## Output format

**CRITICAL RULE** - DO NOT include compliments, positive notes, or praise in your review comments.
Be thorough but filter your comments aggressively - quality over quantity. Focus ONLY on issues, improvements, and actionable feedback.

**Output Violation Examples** (DO NOT DO THIS):
`The code follows best practices by...`
`Positive changes/notes`

**Important**: Submit as "COMMENT" type so the review doesn't block the PR.</code></pre>



<p class="wp-block-paragraph">Yes, some wording might be weird like praising the AI with &#8220;You are a world-class&#8221; or &#8220;your adherence to instructions is absolute&#8221;. Like we mentioned about using uppercase &#8220;DO NOT&#8221; or &#8220;IMPORTANT&#8221;, and others, I can&#8217;t explain some of this stuff or find enough research that claims this affects how the LLM pays&nbsp;<strong>attention</strong>&nbsp;to instructions. I just experiment and learn, and&nbsp;<a href="https://github.com/google-github-actions/run-gemini-cli/blob/main/examples/workflows/pr-review/gemini-review.toml" target="_blank" rel="noreferrer noopener">Gemini</a>&nbsp;likes to use this phrase for code reviews as well 😄 (as well has 115 other devs on GitHub 😅).</p>



<p class="wp-block-paragraph">To be honest, we still have too much noise in AI PR comments, or just tons of fluff. The bright side is, at least the compliments have kind of disappeared 😅 . You might enjoy getting this:</p>



<figure class="wp-block-image size-full"><img decoding="async" width="831" height="182" src="https://blogit.create.pt/wp-content/uploads/2026/01/image-2.png" alt="" class="wp-image-13558" srcset="https://blogit.create.pt/wp-content/uploads/2026/01/image-2.png 831w, https://blogit.create.pt/wp-content/uploads/2026/01/image-2-300x66.png 300w, https://blogit.create.pt/wp-content/uploads/2026/01/image-2-768x168.png 768w, https://blogit.create.pt/wp-content/uploads/2026/01/image-2-696x152.png 696w" sizes="(max-width: 831px) 100vw, 831px" /></figure>



<p class="wp-block-paragraph">I don&#8217;t 🤣, especially when 1 PR has 5 of these. I do praise comments for my team yes, because positive comments are good&#8230; when it comes from a human who knows the other person, IMO. Also, there are many comments that don&#8217;t belong in a PR, they belong in a linter or other tools. We have&nbsp;<a href="https://csharpier.com/docs/About" target="_blank" rel="noreferrer noopener">CSharpier</a>&nbsp;and&nbsp;<a href="https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/overview?tabs=net-10" target="_blank" rel="noreferrer noopener">.NET analyzers</a>&nbsp;for that.</p>



<p class="wp-block-paragraph">It also doesn&#8217;t have the best GitHub integration for now, at least we&#8217;ve had some problems (<a href="https://github.com/anthropics/claude-code-action/issues/584" target="_blank" rel="noreferrer noopener">400 errors</a>,&nbsp;<a href="https://github.com/anthropics/claude-code-action/issues/589" target="_blank" rel="noreferrer noopener">branch 404 errors</a>) with the GitHub action. Like&nbsp;<a href="https://github.com/anthropics/claude-code-action/issues/548" target="_blank" rel="noreferrer noopener">not having access to GitHub mcp tools</a>, even though we set it in&nbsp;<code>allowed-tools</code>&nbsp;option.</p>



<figure class="wp-block-image size-full"><img decoding="async" width="782" height="72" src="https://blogit.create.pt/wp-content/uploads/2026/01/image-1.png" alt="" class="wp-image-13557" srcset="https://blogit.create.pt/wp-content/uploads/2026/01/image-1.png 782w, https://blogit.create.pt/wp-content/uploads/2026/01/image-1-300x28.png 300w, https://blogit.create.pt/wp-content/uploads/2026/01/image-1-768x71.png 768w, https://blogit.create.pt/wp-content/uploads/2026/01/image-1-696x64.png 696w" sizes="(max-width: 782px) 100vw, 782px" /></figure>



<p class="wp-block-paragraph">Anyway, we iterated a lot on instructions and prompts so far, since we use them for both Claude and Copilot. Here is a quick recap of what features we use from Claude Code:</p>



<ul style="max-width:965px" class="wp-block-list">
<li>Sub-agents (custom and built-in)</li>



<li>Built-in&nbsp;<code>/review</code>&nbsp;and&nbsp;<a href="https://www.claude.com/blog/automate-security-reviews-with-claude-code" target="_blank" rel="noreferrer noopener">security review</a>&nbsp;commands</li>



<li>Custom slash commands (<code>code-review.md</code>)</li>



<li>Plugins, specifically&nbsp;<a href="https://github.com/anthropics/claude-code/blob/main/plugins/code-review/commands/code-review.md" target="_blank" rel="noreferrer noopener">code-review plugin</a>&nbsp;authored by Boris Cherny</li>
</ul>



<p class="wp-block-paragraph">We leverage those 2 built-in commands, in parallel, but it&#8217;s just to see if we get any good feedback. Our custom code review slash command already does a good review following our guidelines, plus the &#8220;code-review&#8221; plugin from Boris is works very well with parallel agents. We basically went through the famous spiral:</p>



<pre class="wp-block-code"><code>Write CLAUDE.md -&gt; Ask for code review -&gt; Find bad comments and noise we don't want -&gt; Re-write CLAUDE.md and other files -&gt; Do some meta-prompting -&gt; Repeat</code></pre>



<p class="wp-block-paragraph">Like I said, our custom code review prompt/command has evolved through time, and was refined when we learned something new. We started with this&nbsp;<a href="https://github.com/anthropics/claude-code-action/issues/60#issuecomment-2952771401" target="_blank" rel="noreferrer noopener">incredible suggestion</a>&nbsp;to use the GitHub MCP. We also searched for other GitHub repos, mostly .NET related to see how they set up their instructions. In case they have anything particular around code review (e.g. for GitHub Copilot). I find&nbsp;<a href="https://github.com/dotnet/aspire/blob/main/.github/copilot-instructions.md">.NET Aspire</a>&nbsp;to be a super cool real-life example 🙂 . I think a lot of their AI adoption is lead by David Fowler. So I often check their PRs to see what we can learn from them, e.g.&nbsp;<a href="https://github.com/dotnet/aspire/pull/13361" target="_blank" rel="noreferrer noopener">this one</a>.</p>



<p class="wp-block-paragraph">Anyway, our prompt was still a bit vague, so we had some chats with Claude, good old meta-prompting 🙂. After a while, Claude suggested a new file that has all the coding standards and bad smells we want to avoid &#8211;&nbsp;<code>code-review.instructions.md</code>. It does live under&nbsp;<code>.github/instructions</code>&nbsp;but it doesn&#8217;t matter, Claude can use it. The bad smells are specific and we see them referenced quite often in our PRs now. Still, we don&#8217;t have a perfect solution for overly large PRs. We simply communicate more often or have more than one dev working in the PR for those cases. When a feature genuinely requires lots of new code, the best forum to debate and provide actionable feedback is by talking. Sure, this isn&#8217;t always possible, people are busy or prefer async work. In our team going on call, or during the demo of the PR, helps make large PRs way more digestible. Draft PRs also work somewhat, to get some feedback early on.</p>



<h4 class="wp-block-heading">Avoiding noise comments</h4>



<p class="wp-block-paragraph">Our biggest lesson learned here is running locally our custom slash command for code review and using sug-agents. Locally, we can try to provide the proper context for the review, the rest is the agent using tools and doing reasoning. No noise gets sent to GitHub comments because all the back-and-forth is done in the chat, plus right now Claude Code works better locally, not on GitHub Actions. Having sub-agents has been amazing since the main reason Claude Code uses it is for context management. Since we now have a built-in&nbsp;<code>Explore</code>&nbsp;sub-agent, our code review command uses that in order to have Explore sub-agents run in parallel (with Haiku 4.5) and not clog up the main context window.</p>



<p class="wp-block-paragraph">I&#8217;ve learned recently of&nbsp;<a href="https://blog.sshh.io/i/177742847/custom-subagents" target="_blank" rel="noreferrer noopener">other devs using a different workflow</a>, basically leveraging the&nbsp;<code>Task</code>&nbsp;tool for the main agent to spawn sub-agents. Whichever way you want to do it, using a sub-agent that is focused on exploring the codebase and potential impacts of this PR is something I recommend.<a href="https://github.com/BOLT04/Blog-Posts/blob/master/2025/lessons-learned-from-improving-code-reviews-ai.md#avoiding-noise-comments"></a></p>



<h3 class="wp-block-heading">Saving learnings in memory</h3>



<p class="wp-block-paragraph">Every once in a while, once we&#8217;ve merged a few PRs. We use Claude to improve itself again based on these PRs. This is our prompt:</p>



<pre class="wp-block-code"><code>Please look at the 5 most recent PRs in our GitHub repository, and check for learnings in order to improve the code review workflow. Please ultrathink on this task, so that all necessary memory files are updated taking into account these learnings, like @CLAUDE.md and @.github\instructions\ Focus on seeing code review comments that were good and made it into the codebase afterwards (e.g. coding standards violations). Ignore bad comments that were resolved with a "negative comment" or thumbs down emoji. Ask me clarifying questions before you begin. YOU MUST create a changelog file explaining why you made these edits to instruction files. Each learning must reference a PR that exists. The best is for you to link the exact comment that you used for a given learning</code></pre>



<p class="wp-block-paragraph">At the end of the session, we usually have a few items that are good enough to add. Mostly are&nbsp;<strong>learnings around bugs</strong>&nbsp;we can catch earlier, some are coding standards. Honestly, a lot of suggestions aren&#8217;t what I want or I just think they won&#8217;t be useful in future code reviews. But doing this has been important for me to also take a step back and think about what we can learn from the work we&#8217;ve already merged. I reflect on it and then discuss with my team. I&#8217;ve seen others also talk about this idea and have a&nbsp;<code>learnings.md</code>, e.g.&nbsp;<a href="https://github.com/nibzard/awesome-agentic-patterns/blob/main/LEARNINGS.md" target="_blank" rel="noreferrer noopener">this repo</a>. At least this process seems better for us than simply using emojis to give feedback that&nbsp;<a href="https://www.coderabbit.ai/blog/why-emojis-suck-for-reinforcement-learning" target="_blank" rel="noreferrer noopener">CodeRabbit blog</a>&nbsp;also eludes to 😅.<a href="https://github.com/BOLT04/Blog-Posts/blob/master/2025/lessons-learned-from-improving-code-reviews-ai.md#saving-learnings-in-memory"></a></p>



<h3 class="wp-block-heading">GitHub Copilot</h3>



<p class="wp-block-paragraph">Copilot&#8217;s code review features were super basic in the beginning. We tried and experimented with it a lot when it came out. It only caught nitpicks,&nbsp;<code>console.log</code>&nbsp;and typos, really not helpful on any other area. Sure catching this is good, but a human reviewer catches that in the first pass too. It didn&#8217;t support all languages so we often got 0 comments or feedback. Then in the last months, completely different, night and day.</p>



<p class="wp-block-paragraph">If you have seen GitHub Universe, you know <a href="https://dev.to/bolt04/github-universe-2025-recap-9gl" target="_blank" rel="noreferrer noopener">what&#8217;s new</a>. But in case you don&#8217;t know, the GitHub team has invested heavily in Copilot code review and coding agent, and it shows. The code review agent is often right in every comment, it makes suggestions that are actually based on our instructions and memory files, meaning our PRs follow consistent code style and team conventions (with a link to these&nbsp;<a href="https://docs.github.com/en/copilot/how-tos/configure-custom-instructions/add-repository-instructions" target="_blank" rel="noreferrer noopener">docs</a>).</p>



<figure class="wp-block-image size-full is-resized"><img decoding="async" width="797" height="356" src="https://blogit.create.pt/wp-content/uploads/2026/01/image-4.png" alt="" class="wp-image-13562" style="aspect-ratio:2.2388195797239607;width:799px;height:auto" srcset="https://blogit.create.pt/wp-content/uploads/2026/01/image-4.png 797w, https://blogit.create.pt/wp-content/uploads/2026/01/image-4-300x134.png 300w, https://blogit.create.pt/wp-content/uploads/2026/01/image-4-768x343.png 768w, https://blogit.create.pt/wp-content/uploads/2026/01/image-4-696x311.png 696w" sizes="(max-width: 797px) 100vw, 797px" /></figure>



<p class="wp-block-paragraph">And the agent session is somewhat transparent, since you can view it in GitHub actions now:</p>



<figure class="wp-block-image size-full"><img decoding="async" width="998" height="262" src="https://blogit.create.pt/wp-content/uploads/2026/01/image-3.png" alt="" class="wp-image-13559" srcset="https://blogit.create.pt/wp-content/uploads/2026/01/image-3.png 998w, https://blogit.create.pt/wp-content/uploads/2026/01/image-3-300x79.png 300w, https://blogit.create.pt/wp-content/uploads/2026/01/image-3-768x202.png 768w, https://blogit.create.pt/wp-content/uploads/2026/01/image-3-696x183.png 696w" sizes="(max-width: 998px) 100vw, 998px" /></figure>



<p class="wp-block-paragraph">I mean &#8220;somewhat&#8221; because there are things I can&#8217;t configure, just like Claude Code and most tools, I guess 😅. In the logs I can see the option&nbsp;<code>UseGPT5Model=false</code>, and that it&#8217;s using Sonnet 4.5. There is also this &#8220;MoreSeniorReviews&#8221; flag that I couldn&#8217;t find any info on, and believe me&#8230; I wanted to because it was set to false 🤣 &#8211; the logs show <code>ccr[MoreSeniorReviews=false;EnableAgenticTools=true;EnableMemoryUsage=false...</code></p>



<p class="wp-block-paragraph">Are you telling me there could be a hidden way to get a more senior review&#8230; sign me up! Jokes aside, I couldn&#8217;t find much info on the endpoint&nbsp;<code>api.githubcopilot.com/agents/swe</code>&nbsp;of CAPI (presumably Copilot API) the Autofind agent was calling, and the contents of the&nbsp;<code>ccr/callback</code>&nbsp;saved in&nbsp;<code>results-agent.json</code>. I can only hope some of these options are configurable in the future.</p>



<p class="wp-block-paragraph">I checked the&nbsp;<a href="https://docs.github.com/en/copilot/how-tos/provide-context/use-mcp/extend-copilot-chat-with-mcp#remote-server-configuration-example-with-oauth" target="_blank" rel="noreferrer noopener">MCP docs</a>, hoping to find details about these options, but no luck.</p>



<p class="wp-block-paragraph">Anyway, it also now has access to CodeQL and some linters, which is amazing because we didn&#8217;t have this before. It&#8217;s the way we are able to leverage CodeQL analysis in all our PRs now, we couldn&#8217;t do this in any other AI code review tool. We also see that it calls the tool &#8220;store_comment&#8221; during its session, and only submits the comments to GitHub in the end. This is useful since sometimes it stores a comment because it thought something was wrong in the implementation, and afterwards it read more code into context that invalidated the stored comment, so it no longer submits that comment in the PR. Much like the CodeRabbit validation agent, reducing the amount of noise we get in PRs.<a href="https://private-user-images.githubusercontent.com/18630253/523918240-368bf2e4-26fe-4342-8c91-bde756f00f63.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Njc4MjMyNTAsIm5iZiI6MTc2NzgyMjk1MCwicGF0aCI6Ii8xODYzMDI1My81MjM5MTgyNDAtMzY4YmYyZTQtMjZmZS00MzQyLThjOTEtYmRlNzU2ZjAwZjYzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjAxMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwMTA3VDIxNTU1MFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWIwNTE3MzI4YjM0MzhlOTNiZTFiYjc2MTFjM2VhNGJmNDlhYmNkMjUyZWM1OTg1MDRjZjI2OTQxNTZlODc3OWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.ocuPov64xdo7orLKVnj6MevAujg8_k6GFgn-5lDYHCY" target="_blank" rel="noreferrer noopener"></a></p>



<h3 class="wp-block-heading">CodeRabbit and Qodo</h3>



<p class="wp-block-paragraph">Let&#8217;s start with the cool features CodeRabbit has:</p>



<ul style="max-width:1005px" class="wp-block-list">
<li>Code diagrams in Mermaid</li>



<li>Generates a poem! Yes, a poem for my PR</li>



<li>Summary of changes added to the description</li>
</ul>



<p class="wp-block-paragraph">Now&#8230; I gotta be honest, I don&#8217;t care about any of them 😅. They are cool, but I only glance at the poem or ignore it. Never read or care about the summary; I get one from Copilot and edit it myself. All code and sequence diagrams I saw generated in our PRs, were simply not useful, but a lot are from front-end code. I simply don&#8217;t look at them later, and if it makes sense, we update our architecture diagrams later once the code is merged. With that said, the code suggestions and feedback it obscene. By far the best code review AI tool when it comes to actionable and valuable feedback/suggestions (by a long shot)! Even if we didn&#8217;t configure&nbsp;<code>.coderabbit.yaml</code>&nbsp;or tried to optimize it, CodeRabbit already uses&nbsp;<a href="https://docs.coderabbit.ai/integrations/knowledge-base#code-guidelines:-automatic-team-rules" target="_blank" rel="noreferrer noopener">Claude and Copilot instructions</a>&nbsp;so the work we did on those was probably used in CodeRabbit. In some of our PRs it caught some nasty bugs and gave super useful feedback. Our team was impressed!</p>



<p class="wp-block-paragraph">The insights CodeRabbit adds during code review piqued my interest. I read a few of their blog posts on context engineering like&nbsp;<a href="https://www.coderabbit.ai/blog/context-engineering-ai-code-reviews" target="_blank" rel="noreferrer noopener">this one</a>, where I found it interesting that there is a separate validation agent before submitting comments. This is probably why they maintain a high signal-to-noise ratio. I also read their open-source version of CodeRabbit, they have some&nbsp;<a href="https://github.com/coderabbitai/ai-pr-reviewer/blob/main/src/prompts.ts" target="_blank" rel="noreferrer noopener">prompts</a>&nbsp;there. I know it&#8217;s old, but it&#8217;s what I have access to. I especially like the instructions that we also have 😅 &#8220;Do NOT provide general feedback, summaries, explanations of changes, or praises for making good additions&#8221;.</p>



<p class="wp-block-paragraph">We basically tried to have Claude and Copilot understand our large codebase, not focusing only on the PR diff. It&#8217;s harder, we still have a lot to improve here.&nbsp;<a href="https://www.coderabbit.ai/blog/how-coderabbit-delivers-accurate-ai-code-reviews-on-massive-codebases" target="_blank" rel="noreferrer noopener">CodeRabbit claims</a>&nbsp;it&#8217;s known to be great at understanding large codebases. I don&#8217;t see any research backing this, just opinions. But yes, we humans don&#8217;t like large PRs either:</p>



<figure class="wp-block-image size-full"><img decoding="async" width="638" height="436" src="https://blogit.create.pt/wp-content/uploads/2026/01/image-7.png" alt="" class="wp-image-13571" srcset="https://blogit.create.pt/wp-content/uploads/2026/01/image-7.png 638w, https://blogit.create.pt/wp-content/uploads/2026/01/image-7-300x205.png 300w, https://blogit.create.pt/wp-content/uploads/2026/01/image-7-615x420.png 615w, https://blogit.create.pt/wp-content/uploads/2026/01/image-7-218x150.png 218w" sizes="(max-width: 638px) 100vw, 638px" /></figure>



<p class="wp-block-paragraph">In my opinion I couldn&#8217;t find that many large PRs that were way better reviewed by CodeRabbit, in comparison to Claude Code and Copilot. But one thing we liked a lot is that it uses&nbsp;<strong>collapsed sections</strong>&nbsp;in markdown very well, for example:</p>



<figure class="wp-block-image size-full"><img decoding="async" width="897" height="457" src="https://blogit.create.pt/wp-content/uploads/2026/01/image-5.png" alt="" class="wp-image-13563" srcset="https://blogit.create.pt/wp-content/uploads/2026/01/image-5.png 897w, https://blogit.create.pt/wp-content/uploads/2026/01/image-5-300x153.png 300w, https://blogit.create.pt/wp-content/uploads/2026/01/image-5-768x391.png 768w, https://blogit.create.pt/wp-content/uploads/2026/01/image-5-824x420.png 824w, https://blogit.create.pt/wp-content/uploads/2026/01/image-5-696x355.png 696w" sizes="(max-width: 897px) 100vw, 897px" /></figure>



<p class="wp-block-paragraph">But I mean, we did have cases that we tried to use Claude Code for code review on a PR that was reviewed by CodeRabbit, and like ~60% of the context window was comments made by CodeRabbit. All that markdown ain&#8217;t friendly for AI with limited context windows. There were times I swear I could see Claude behind every word CodeRabbit made, with the &#8220;You&#8217;re absolutely correct&#8221; 🤣, e.g.</p>



<figure class="wp-block-image size-full is-resized"><img decoding="async" width="782" height="164" src="https://blogit.create.pt/wp-content/uploads/2026/01/image-6.png" alt="" class="wp-image-13564" style="width:836px;height:auto" srcset="https://blogit.create.pt/wp-content/uploads/2026/01/image-6.png 782w, https://blogit.create.pt/wp-content/uploads/2026/01/image-6-300x63.png 300w, https://blogit.create.pt/wp-content/uploads/2026/01/image-6-768x161.png 768w, https://blogit.create.pt/wp-content/uploads/2026/01/image-6-696x146.png 696w" sizes="(max-width: 782px) 100vw, 782px" /></figure>



<p class="wp-block-paragraph">But it could be GPT models or whatever, we never truly know what is behind these products 🙂.</p>



<h4 class="wp-block-heading">Qodo</h4>



<p class="wp-block-paragraph">As for Qodo, we liked the fact it checks for compliance and flags violations as non-compliant (no other tool had this built in). This was previously just a bullet point in our markdown file. The code review feedback was good, sometimes we ended up doing the suggested changes Qodo leaves in the comment. After reading more about what compliance checks Qodo does, we improved by adding specific instructions on our&nbsp;<code>code-review.instructions.md</code>&nbsp;for ISO 9001, GDPR and others:</p>



<pre class="wp-block-code"><code>## Regulatory Compliance Checks

### Data Protection (GDPR/HIPAA/PCI-DSS)
- Does this code handle PII (Personally Identifiable Information)?
- Are sensitive fields properly encrypted at rest and in transit?
- Is data retention policy followed (deletion after X days)?
- Are audit logs created for data access?
- Is data anonymization/pseudonymization applied where required?

### Security Standards (SOC 2 / ISO 27001)
- Are all external API calls wrapped with proper error handling?
- Is input validation present for all user inputs?
- Are authentication checks present on all sensitive endpoints?
- Are secrets/credentials stored securely (no hardcoding)?
- Is sensitive data logged or exposed in error messages?</code></pre>



<p class="wp-block-paragraph">We kept experimenting with Qodo for longer than CodeRabbit, but the insights and feedback never reached the level of CodeRabbit. It was still a good tool that improved our codebase and sparked good discussions.</p>



<h2 class="wp-block-heading">Tool of choice</h2>



<p class="wp-block-paragraph">Our prompts/instructions can still be improved, of course. We&#8217;ve experimented with different prompts, memory and instruction files. We&#8217;ve also researched how other teams use AI for code review, and how tools like CodeRabbit do context engineering. All of this is because our goal is to continue to improve our software development process and ensure high quality. Adopting new tools is a way of achieving this goal. Given that most AI code review tools have a price tag, we decided to focus on using only one/two tools and optimizing them. Yes, it&#8217;s Claude Code and GitHub Copilot 😄. I basically use 100% of both Copilot and Claude every month, but I get more requests from Claude even though I hit the weekly rate limit every time.</p>



<p class="wp-block-paragraph">We know CodeRabbit is amazing, and these paid AI tools will continue getting better. There is actually a new tool supporting code review we didn&#8217;t use,&nbsp;<a href="https://www.augmentcode.com/product/code-review" target="_blank" rel="noreferrer noopener">Augment Code</a>&nbsp;(these AI companies move so fast 😅). No amount of customizing our setup with Claude or Copilot will reach the same output as these specific code review paid tools. But for us, it makes more sense to pay for one tool, for example, and leverage it in multiple steps of our software development lifecycle.</p>



<h3 class="wp-block-heading">Improving multi-agent collaboration</h3>



<p class="wp-block-paragraph">Claude and Copilot are working very well for our code review process. But like I&#8217;ve been saying, there is work to do. We learned a lot from using each tool, but there are more areas to improve, at least in Claude Code since we have more flexibility there. I&#8217;m currently looking at implementing the &#8220;Debate and Consensus&#8221; multi-agent design pattern (<a href="https://arxiv.org/abs/2406.11776" target="_blank" rel="noreferrer noopener">Google DeepMind paper</a>&nbsp;and&nbsp;<a href="https://arxiv.org/abs/2509.11035" target="_blank" rel="noreferrer noopener">Free-MAD</a>), basically a&nbsp;<a href="https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/ai-agent-design-patterns#group-chat-orchestration" target="_blank" rel="noreferrer noopener">group chat orchestration</a>. I just want to try it out, I&#8217;m not sure I&#8217;ll have better code reviews by having different agents (e.g. Security, Quality and Performance) debate and review the code through different perspectives. If they run sequentially, the quality agent can have questions for the performance agent, and each can agree or disagree with the reported issues. We can try out the LLM-as-a-Judge as well, to focus on reducing noise and following code quality standards.</p>



<p class="wp-block-paragraph">Anyway, we&#8217;ll continue learning, optimizing, and improving the way we work 🙂.</p>



<h2 class="wp-block-heading">Resources</h2>



<ul style="max-width:1005px" class="wp-block-list">
<li><a href="https://graphite.com/blog/ai-wont-replace-human-code-review" target="_blank" rel="noreferrer noopener">Why AI will never replace human code review</a></li>



<li><a href="https://www.youtube.com/watch?v=-GIiTfKZx6M" target="_blank" rel="noreferrer noopener">AI Code Reviews with CodeRabbit&#8217;s Howon Lee</a></li>



<li><a href="https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report" target="_blank" rel="noreferrer noopener">CodeRabbit report: AI code creates 1.7x more problems</a></li>



<li><a href="https://awesomereviewers.com/reviewers/" target="_blank" rel="noreferrer noopener">Awesome reviewers GH repo</a></li>



<li><a href="https://www.youtube.com/watch?v=nItsfXwujjg" target="_blank" rel="noreferrer noopener">Anthropic’s NEW Claude Code Review Agent (Full Open Source Workflow)</a></li>



<li><a href="https://blog.sshh.io/p/how-i-use-every-claude-code-feature" target="_blank" rel="noreferrer noopener">How I Use Every Claude Code Feature</a></li>
</ul>



<h2 class="wp-block-heading">Conclusion</h2>



<p class="wp-block-paragraph">The number one thing we learned is:&nbsp;<strong>experimentation is king</strong>. Like we talked before, the Jagged Frontier changes with every model release. Claude Opus 4.5 behaves a bit differently, for example, on&nbsp;<a href="https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-4-best-practices#tool-usage-and-triggering" target="_blank" rel="noreferrer noopener">tool triggering</a>&#8230; maybe we can stop shouting and being aggressive 🤣. We must experiment and keep learning. We can&#8217;t calibrate the prompt once and expect the best result.</p>



<p class="wp-block-paragraph">For now we are quite happy, the human reviewer has more time to focus on design decisions and discuss trade-offs with the author of the PR. I don&#8217;t envision a future where AI does 100% of the code review.</p>



<p class="wp-block-paragraph">If you&#8217;re considering AI for code reviews, my advice is simple: just try it. Pick one tool, run a one-month pilot, and see what happens. The worst case is you turn it off. The best case is that your team becomes augmented and probably improves code quality.</p>



<p class="wp-block-paragraph">My next blog post in this series will be about how we are using agentic coding tools! Are you using AI code review tools? I&#8217;d love to hear from you what your experience has been. Leave a comment and let&#8217;s chat 🙂 .</p>



<p class="wp-block-paragraph"></p>
<p>The post <a href="https://blogit.create.pt/davidpereira/2026/01/09/lessons-learned-improving-code-reviews-with-ai/">Lessons learned improving code reviews with AI</a> appeared first on <a href="https://blogit.create.pt">Blog IT</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blogit.create.pt/davidpereira/2026/01/09/lessons-learned-improving-code-reviews-with-ai/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Becoming augmented by AI</title>
		<link>https://blogit.create.pt/davidpereira/2025/09/10/becoming-augmented-by-ai/</link>
					<comments>https://blogit.create.pt/davidpereira/2025/09/10/becoming-augmented-by-ai/#respond</comments>
		
		<dc:creator><![CDATA[David Pereira]]></dc:creator>
		<pubDate>Wed, 10 Sep 2025 17:24:13 +0000</pubDate>
				<category><![CDATA[Misc]]></category>
		<category><![CDATA[AI]]></category>
		<category><![CDATA[GenAI]]></category>
		<guid isPermaLink="false">https://blogit.create.pt/?p=13531</guid>

					<description><![CDATA[<p>Table of Contents Introduction We&#8217;re deep into Co-Intelligence in Create IT&#8217;s book club — definitely worth your time! Between that and the endless stream of LLM content online, I&#8217;ve been in full research mode. Still, I can&#8217;t just watch and hear others talk about these tools, I must experiment myself and learn how to use [&#8230;]</p>
<p>The post <a href="https://blogit.create.pt/davidpereira/2025/09/10/becoming-augmented-by-ai/">Becoming augmented by AI</a> appeared first on <a href="https://blogit.create.pt">Blog IT</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading">Table of Contents</h2>



<ul style="max-width:1005px" class="wp-block-list">
<li>Introduction</li>



<li>The &#8220;Jagged Frontier&#8221; concept</li>



<li>Becoming augmented by AI
<ul style="max-width:960px" class="wp-block-list">
<li>AI as a co-worker</li>



<li>AI as a co-teacher</li>
</ul>
</li>



<li>My augmentation list
<ul style="max-width:960px" class="wp-block-list">
<li>Custom instructions</li>



<li>Meta-prompting</li>
</ul>
</li>



<li>Resources</li>



<li>Conclusion</li>
</ul>



<h2 class="wp-block-heading">Introduction</h2>



<p class="wp-block-paragraph">We&#8217;re deep into <a href="https://www.amazon.com/-/pt/dp/059371671X/ref=sr_1_1">Co-Intelligence</a> in Create IT&#8217;s book club — definitely worth your time! Between that and the endless stream of LLM content online, I&#8217;ve been in full research mode. Still, I can&#8217;t just watch and hear others talk about these tools, I must experiment myself and learn how to use them for my use cases.</p>



<p class="wp-block-paragraph">Software development is complex. My job isn&#8217;t just churning out code, but there are many concepts in this book that we&#8217;ve internalized and started adopting. In this post, I&#8217;ll share my opinions and some of the practical guidelines our team has been following to be augmented by AI.</p>



<h2 class="wp-block-heading">The &#8220;Jagged Frontier&#8221; concept</h2>



<p class="wp-block-paragraph">The Jagged Frontier described by the author Ethan Mollick is an amazing concept in my opinion. It&#8217;s where tasks that appear to be of similar difficulty may either be performed better or worse by humans using AI. Due to the &#8220;jagged&#8221; nature of the frontier, the same knowledge workflow of tasks can have tasks on both sides of the frontier according to a <a href="https://www.hbs.edu/ris/Publication%20Files/24-013_d9b45b68-9e74-42d6-a1c6-c72fb70c7282.pdf">publication where the author took part</a>.</p>



<p class="wp-block-paragraph">This leads to the&nbsp;<strong>Centaur vs. Cyborg</strong>&nbsp;distinction which is really interesting. Using both approaches (deeply integrated collaboration and separation of tasks) seems to be the goal to achieve co-intelligence. One very important Cyborg practice seen in that publication is &#8220;push-back&#8221; and &#8220;demanding logic explanation&#8221;, meaning we disagree with the AI output, give it feedback, and ask it to reconsider and explain better. Or as I often do, ask it to double-check with official documentation that what it&#8217;s telling me is correct. It&#8217;s also important to understand that this frontier can change as these models improve. Hence, the focus on experimentation to understand where the Jagged Frontier lies in each LLM. It&#8217;s definitely knowledge that everyone in the industry right now wants to acquire (maybe share it afterwards 😅).</p>



<h2 class="wp-block-heading">Becoming augmented by AI</h2>



<p class="wp-block-paragraph">I&#8217;m aware of the marketed productivity gains, where&nbsp;<a href="https://github.blog/news-insights/research/research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/">GitHub Copilot usage makes devs 55% faster</a>, and other studies that have been posted about GenAI increasing productivity. I&#8217;m also aware of the studies claiming the opposite 😄 like the&nbsp;<a href="https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/">METR study</a>&nbsp;showing AI makes devs&nbsp;<strong>19% slower</strong>. However, I don&#8217;t see 55% productivity gains for myself, and I don&#8217;t think it makes me slower either.</p>



<p class="wp-block-paragraph">In my opinion, productivity gains aren&#8217;t measured by producing more code. Number of PRs? Nope. Acceptance rate for AI suggestions? Definitely not! I firmly believe the less code, the better. The less slop the better too 😄. I&#8217;m currently focused on assessing&nbsp;<strong>DORA metrics</strong>&nbsp;and others for my team, because we want to measure how AI-assisted coding and the other ways we use it as an augmentation tool, actually improves those metrics, or make them worse. The rest of marketing and hype doesn&#8217;t matter.</p>



<p class="wp-block-paragraph">Ethan Mollick provides numerous examples and research on how professionals across industries are already leveraging AI tools, like the Cyborg approach. But if we focus on our software industry, what does it mean for a tech lead to be augmented by AI? What tasks would be good to involve an AI in without compromising quality?</p>



<h3 class="wp-block-heading">AI as a co-worker</h3>



<p class="wp-block-paragraph">For a tech lead that works with Azure services, an important skill is to know how to leverage the correct Azure services to build, deploy, and manage a scalable solution. So it becomes very useful to have an AI partner that can have a conversation about this, for example about Azure Durable Functions. This conversation can be shallow, and not get all the implementation details 100% correct. That&#8217;s okay, because the tech lead (and any dev 😅) also needs to exhibit&nbsp;<strong>critical thinking</strong>&nbsp;and evaluate the AI responses.&nbsp;<strong>This is not a skill we want to delegate</strong>&nbsp;to these models, at least in my opinion and in the&nbsp;<a href="https://www.oneusefulthing.org/p/against-brain-damage">author&#8217;s opinion</a>. There is a relevant&nbsp;<a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2025/01/lee_2025_ai_critical_thinking_survey.pdf">research paper</a>&nbsp;about this by Microsoft as well.</p>



<p class="wp-block-paragraph">The goal can simply be to have a conversation with a co-worker to spark some new ideas or possible solutions that we haven&#8217;t thought of. Using AI for ideation is a great use case, not just for engineering, but for product features too like UI/UX, important metrics to capture, etc. If it generates 20 ideas, there is a higher chance you find the bad ones, filter them out, and clear your mind or steer it into better ideas. Here is an example to get some ideas on fixing a recurring exception:</p>


<div class="wp-block-image">
<figure class="aligncenter size-large is-resized"><img decoding="async" width="1024" height="810" src="https://blogit.create.pt/wp-content/uploads/2025/09/image-1024x810.png" alt="" class="wp-image-13535" style="width:761px;height:auto" srcset="https://blogit.create.pt/wp-content/uploads/2025/09/image-1024x810.png 1024w, https://blogit.create.pt/wp-content/uploads/2025/09/image-300x237.png 300w, https://blogit.create.pt/wp-content/uploads/2025/09/image-768x607.png 768w, https://blogit.create.pt/wp-content/uploads/2025/09/image-531x420.png 531w, https://blogit.create.pt/wp-content/uploads/2025/09/image-696x550.png 696w, https://blogit.create.pt/wp-content/uploads/2025/09/image-1068x844.png 1068w, https://blogit.create.pt/wp-content/uploads/2025/09/image.png 1185w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">Example of using AI to get multiple options</figcaption></figure>
</div>


<p class="wp-block-paragraph"></p>



<p class="wp-block-paragraph">It asks clarifying questions so that I can give it more useful context. Then I can see the response, iterate, or ask for more ideas, etc. I usually always set these instructions for any LLM:</p>



<pre class="wp-block-code"><code>Ask clarifying questions before giving an answer. Keep explanations not too long. Try to be as insightful as possible, and remember to verify if a solution can be implemented when answering about Azure and architecture in general.
It's also very important for you to verify if there is official documentation that supports your claims and statements. Please find official documentation supporting your claims, before responding to a user. If there isn't documentation confirming your statement, don't include it in the response.</code></pre>



<p class="wp-block-paragraph">That is also why it searches for docs. I&#8217;ve gotten way too many statements in the LLM&#8217;s response that when I follow-up on, it realizes it made an error, or assumption, etc. When I ask it further about that sentence that it just gave me, I just get &#8220;You&#8217;re right &#8211; I was wrong about that&#8221;&#8230; Don&#8217;t become too over-reliant on these tools 😅.</p>



<h3 class="wp-block-heading">AI as a co-teacher</h3>



<p class="wp-block-paragraph">With that said, the tech lead and senior devs are also responsible for upskilling their team by sharing knowledge, best practices, challenging juniors with more complex tasks, etc. And this part of the job isn&#8217;t that simple; it&#8217;s hard to be a force multiplier that improves everyone around you. So, what if the tech lead could use AI in this way, by creating&nbsp;<a href="https://code.visualstudio.com/docs/copilot/customization/prompt-files">reusable prompts</a>, documentation, and custom agents? How about the tech lead uses AI as a co-teacher, and then shares how to do it with the rest of the team? All of these are then able to help juniors be onboarded, help them understand our codebase and our domain.&nbsp;<a href="https://www.anthropic.com/engineering/claude-code-best-practices">Claude Code Best practices post</a>&nbsp;also reference onboarding as a good use case that helps Anthropic engineers:</p>



<p class="wp-block-paragraph"><em>&#8220;At Anthropic, using Claude Code in this way has become our core onboarding workflow, significantly improving ramp-up time and reducing load on other engineers.&#8221;</em></p>



<p class="wp-block-paragraph">A lot of onboarding time is spent on understanding the business logic and then how it&#8217;s implemented. For juniors, it&#8217;s also about the design patterns or codebase structure. So I really think this is a net-positive for the whole team.</p>



<h2 class="wp-block-heading">My augmentation list</h2>



<p class="wp-block-paragraph">It might not be much, but these are essentially the tasks I&#8217;m augmented by AI:</p>



<p class="wp-block-paragraph"><strong>Technical</strong>:</p>



<ul style="max-width:1005px" class="wp-block-list">
<li><strong>Initial</strong>&nbsp;code review (e.g. nitpicks, typos), some stuff I should really just automate 😅</li>



<li>Generate summaries for the PR description</li>



<li>Architectural discussions, including trade-off and risk analysis
<ul style="max-width:960px" class="wp-block-list">
<li>Draft an ADR (Architecture decision record) based on my analysis and arguments</li>
</ul>
</li>



<li>Co-Teacher and Co-Worker
<ul style="max-width:960px" class="wp-block-list">
<li>&#8220;Deep Research&#8221; and discussion about possible solutions</li>



<li>Learn new tech with analogies or specific Azure features</li>



<li>Find new sources of information (e.g. blog posts, official docs, conference talks)</li>
</ul>
</li>



<li>Troubleshooting for specific infrastructure problems
<ul style="max-width:960px" class="wp-block-list">
<li>Generating KQL queries (e.g. rendering charts, analyzing traces &amp; exceptions &amp; dependencies)</li>
</ul>
</li>



<li>Refactoring and documentation suggestions</li>



<li>Generation of new unit tests given X scenarios</li>
</ul>



<p class="wp-block-paragraph"><strong>Non-technical</strong></p>



<ul style="max-width:1005px" class="wp-block-list">
<li>Summarizing book chapters/blog posts or videos (e.g. NotebookLM)</li>



<li>Role play in various scenarios (e.g. book discussions)</li>
</ul>



<p class="wp-block-paragraph">Of course, we also need to talk about the tasks that fall outside the Jagged Frontier. Again, these can vary from person to person. From my usage and experiments so far, these are the tasks that currently fall outside the frontier:</p>



<ul style="max-width:1005px" class="wp-block-list">
<li>Being responsible for technical support tickets, where a customer encountered an error or has a question about our product. This involves answering the ticket, asking clarifying questions when necessary, opening up tickets on a 3rd party that are related to this issue, and then resolving the issue.</li>



<li>Deep valuable code review. This includes good insights, suggestions, and knowledge sharing to improve the PR author&#8217;s skills. <a href="https://www.coderabbit.ai/">CodeRabbit</a> does often give valuable code reviews, way better than any other solution. Still not the same as human review 🙂</li>



<li>Development of a v0 (or draft) for new complex features</li>



<li>Fixing bugs that require business domain knowledge</li>
</ul>



<p class="wp-block-paragraph">Delegating some of those tasks would be cool, at least 50% 😄, while our engineering team focuses on other tasks. But oh well, maybe that day will come.</p>



<h2 class="wp-block-heading">AI-assisted coding</h2>



<p class="wp-block-paragraph">AI-assisted coding can be very helpful on some tasks, and lately my goal is to increase the number of tasks AI can assist me. In our team, we&#8217;ve read&nbsp;<a href="https://www.anthropic.com/engineering/claude-code-best-practices">Claude Code Best practices</a>&nbsp;in order to learn and see what fits best for our use case. Then we dive deeper in some topics that post references, for example&nbsp;<a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/extended-thinking-tips">these docs</a>&nbsp;were very useful to learn about Claude&#8217;s extended thinking feature, complementing the usage of &#8220;think&#8221; &lt; &#8220;think hard&#8221; &lt; &#8220;think harder&#8221; &lt; &#8220;ultrathink&#8221;. We also found&nbsp;<a href="https://simonwillison.net/2025/Apr/19/claude-code-best-practices/">this post by Simon</a>&nbsp;about this entire feature that was interesting. In most tasks, using an iterative approach, just like normal software development, is indeed way better than one-shot with the perfect prompt. Still, if it takes too many iterations, like some bugfixes were too complex because it&#8217;s hard to pinpoint the location of the bug, then it loses performance and overall becomes bad (infinite load spinner of death 🤣).</p>



<p class="wp-block-paragraph">Before we can use AI-assisted coding on more complex tasks, we need to improve the output quality. So we&#8217;ve invested a lot of time in fine-tuning custom instructions and meta-prompting. Let&#8217;s talk about these two.</p>



<h3 class="wp-block-heading" id="custom-instructions">Custom Instructions</h3>



<p class="wp-block-paragraph">According to Copilot docs, instructions should be short, self-contained statements. Most principles in&nbsp;<a href="https://learn.microsoft.com/en-us/training/modules/introduction-prompt-engineering-with-github-copilot/2-prompt-engineering-foundations-best-practices">prompt engineering</a>&nbsp;are about being short, specific, and making sure our critical instructions is something the model takes special attention to. Like everyone talks about, the context window is very important, so it&#8217;s really good if we can just have an instruction file of 200 lines. The longer our instructions are, the greater the risk that the LLM won&#8217;t follow them, since it can pay more attention to other tokens or forget relevant instructions. With that said, keeping instructions short is also a challenge when we use the few-shot prompting technique and add more examples.</p>



<p class="wp-block-paragraph">To build our custom instructions, we used C# and Blazor files from&nbsp;<a href="https://github.com/github/awesome-copilot/tree/main">the awesome-copilot repo</a>&nbsp;and other sources of inspiration like&nbsp;<a href="https://parahelp.com/blog/prompt-design">parahelp prompt design</a>&nbsp;to get a first version. We wanted to know what techniques other teams use. Then we made specific edits to follow our own guidelines and removed rules specific to explaining concepts, etc. We also added some&nbsp;<strong>capitalized words</strong>&nbsp;that are common in system prompts or commands, like IMPORTANT, NEVER, ALWAYS, MUST. The IMPORTANT word is also at the end of the instruction, to try and&nbsp;<strong>refocus</strong>&nbsp;the attention to coding standards:</p>



<pre class="wp-block-code"><code>IMPORTANT: Follow our coding standards when implementing features or fixing bugs. If you are unsure about a specific coding standard, ask for clarification.</code></pre>



<p class="wp-block-paragraph">I&#8217;m not 100% sure how this capitalization works, or why it works&#8230; and I have not found docs/evidence/research on this. All I know is that capitalized words have different tokens than lowercase. It&#8217;s probably something the model pays more attention to, since in the training data, when we use these words, it means it&#8217;s important. I do wish Microsoft, OpenAI, and Anthropic included this topic on capitalization in their prompt engineering docs/tutorials.</p>



<p class="wp-block-paragraph">It&#8217;s at the end of our file since it&#8217;s also&nbsp;<a href="https://huggingface.co/papers/2307.03172">being researched that the beginning and end of a prompt</a>&nbsp;are what the LLM pays more attention to and finds more relevant. Some middle parts are &#8220;meh&#8221; and can be forgotten.&nbsp;<a href="https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/prompt-engineering?tabs=chat#repeat-instructions-at-the-end">Microsoft docs</a>&nbsp;say the same essentially, it&#8217;s known as &#8220;<strong>recency bias</strong>&#8220;. In most prompts we see, this section exists at the end to refocus the LLM&#8217;s attention.</p>



<h3 class="wp-block-heading">Meta-prompting</h3>



<p class="wp-block-paragraph">Our goal also isn&#8217;t to have the perfect custom instructions and prompt, since refining it later with an iterative/conversational approach works well. But we came across the concept of&nbsp;<a href="https://cookbook.openai.com/examples/enhance_your_prompts_with_meta_prompting">meta-prompting</a>, a term that is becoming more popular. Basically, we asked Claude how to improve our prompt, and it gave us some cool ideas to improve our instructions/reusable prompts.</p>



<p class="wp-block-paragraph">But don&#8217;t forget to use LLMs with caution&#8230; I keep getting &#8220;You&#8217;re absolutely right&#8230;&#8221; and it&#8217;s annoying how sycophantic it is oftentimes 😅</p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" width="696" height="398" src="https://blogit.create.pt/wp-content/uploads/2025/09/image-3.png" alt="" class="wp-image-13542" srcset="https://blogit.create.pt/wp-content/uploads/2025/09/image-3.png 696w, https://blogit.create.pt/wp-content/uploads/2025/09/image-3-300x172.png 300w" sizes="(max-width: 696px) 100vw, 696px" /></figure>
</div>


<p class="wp-block-paragraph">The quality of the output is most likely affected by the complexity of the task I&#8217;m working on too. Prompting skills only go so far, from what I&#8217;ve researched and learned so far, I can say there is a learning curve for understanding LLMs. So we need to continue experimenting and learning the layers between our prompt and the output we see.</p>



<h2 class="wp-block-heading">Resources</h2>



<p class="wp-block-paragraph">This is not an exhaustive list by any means, just some resources I find very useful:</p>



<ul style="max-width:1005px" class="wp-block-list">
<li><a href="https://www.youtube.com/watch?v=EWvNQjAaOHw&amp;t=7238s">Andrej Karpathy &#8211; How I use LLMs</a></li>



<li><a href="https://www.youtube.com/watch?v=LCEmiRjPEtQ">Andrej Karpathy: Software Is Changing (Again)</a>
<ul style="max-width:960px" class="wp-block-list">
<li>Related to this is&nbsp;<a href="https://natesnewsletter.substack.com/p/software-30-vs-ai-agentic-mesh-why">this post from Nate Jones</a></li>
</ul>
</li>



<li><a href="https://www.youtube.com/watch?v=tbDDYKRFjhk">Does AI Actually Boost Developer Productivity? (100k Devs Study) &#8211; Yegor Denisov-Blanch, Stanford</a></li>



<li><a href="https://www.anthropic.com/engineering/claude-code-best-practices">Claude Code: Best practices for agentic coding</a></li>



<li><a href="https://zed.dev/blog/why-llms-cant-build-software">Why LLMs Can&#8217;t Really Build Software</a></li>



<li><a href="https://www.youtube.com/watch?v=-1yH_BTKgXs">Is AI the Future of Software Development, or Just a new Abstraction? Insights from Kelsey Hightower</a></li>



<li><a href="https://cookbook.openai.com/examples/gpt-5/gpt-5_prompting_guide">GPT-5 prompting guide</a></li>
</ul>



<h2 class="wp-block-heading">Conclusion</h2>



<p class="wp-block-paragraph">I&#8217;ve enjoyed learning and improving myself over the years. But with GenAI I now feel like I could learn a lot more and improve myself even further since I&#8217;m choosing them as&nbsp;<strong>augmentation tools</strong>. Hopefully, this article motivates you to pursue AI augmentation for yourself. It&#8217;s okay to be skeptical about all the hype you watch and hear around these tools. It&#8217;s a good mechanism to not fall for all the sales pitches and fluff CEO&#8217;s and others in the industry talk about. Just don&#8217;t let your skepticism prevent you from learning, experimenting, building your own opinion, and finding ways of improving your work 🙂.</p>



<p class="wp-block-paragraph">Still&#8230; I can&#8217;t deny my curiosity to know more about how these systems work underneath. How is fine-tuning done exactly? How does post-training work? Can these models emit telemetry (logs, traces, metrics) that we can observe? Why does capitalization (e.g. IMPORTANT, MUST) or setting a&nbsp;<a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/system-prompts">role/persona</a>&nbsp;improve prompts? Can we really not have access to a high-level tree with the weights the LLM uses to correlate tokens, and use it to justify why a given output was produced? Or why an instruction given as input was not followed? It&#8217;s okay to just have a basic understanding and know about the new abstractions we have with these LLMs. But knowing how that abstraction works leads to knowing how to transition to automation.</p>



<p class="wp-block-paragraph">I will keep searching and learning more in order to answer these questions or find engineers in the industry who have answered them. Especially around&nbsp;<strong>interpretability research</strong>, which is amazing!!! I recommend reading this research, for example &#8211;&nbsp;<a href="https://www.anthropic.com/research/tracing-thoughts-language-model">Tracing the thoughts of a large language model</a>. Hope you enjoyed reading, feel free to share in the comments below how you use AI to augment yourself 🙂.</p>
<p>The post <a href="https://blogit.create.pt/davidpereira/2025/09/10/becoming-augmented-by-ai/">Becoming augmented by AI</a> appeared first on <a href="https://blogit.create.pt">Blog IT</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blogit.create.pt/davidpereira/2025/09/10/becoming-augmented-by-ai/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
