<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://ethanhn.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://ethanhn.com/" rel="alternate" type="text/html" /><updated>2026-05-19T06:43:53-07:00</updated><id>https://ethanhn.com/feed.xml</id><title type="html">Ethan Nguyen</title><subtitle>AI/ML engineer at Google Public Sector focused on frontier AI systems, operational trust, and high-trust environments.</subtitle><author><name>Ethan Nguyen</name><email>contact@ethannguyen.anonaddy.com</email></author><entry><title type="html">Optimize Against the Real World</title><link href="https://ethanhn.com/posts/2026/05/optimize-against-real-world/" rel="alternate" type="text/html" title="Optimize Against the Real World" /><published>2026-05-19T00:00:00-07:00</published><updated>2026-05-19T00:00:00-07:00</updated><id>https://ethanhn.com/posts/2026/05/optimize-against-real-world</id><content type="html" xml:base="https://ethanhn.com/posts/2026/05/optimize-against-real-world/"><![CDATA[<p>Tighten the feedback loops with the real world.</p>

<p>Companies call it bias for action. Builders call it shipping early.</p>

<p>The principle is the same:</p>

<blockquote>
  <p>Run more experiments and the world will give directions.</p>
</blockquote>

<p>It is easy to polish the product in your head.</p>

<p>People will want this next.<br />
The workflow should be structured like this.<br />
The feature should behave this way.<br />
This edge case probably matters.<br />
This part should be rebuilt before anyone sees it.</p>

<p>Maybe.</p>

<p>Until someone has run the experiment, it is an untested hypothesis.</p>

<p>That is the trap. The work feels productive because the team is still thinking, designing, debating, and improving. But before the product is in front of real users, many of those improvements are aimed at a target that may not exist.</p>

<p>Optimization is only useful when the target is real.</p>

<h2 id="the-product-in-your-head-is-not-the-product-in-the-users-hands">The product in your head is not the product in the user’s hands</h2>

<p>Before users touch a product, the builder has a clean version of the workflow in mind.</p>

<p>The user has a problem.<br />
The product explains what it does.<br />
The user understands the next step.<br />
The feature creates value.<br />
The workflow continues.</p>

<p>That is the imagined path.</p>

<p>The real path is usually messier.</p>

<p>A user hesitates on a step the team thought was obvious. They ignore the feature that felt central. They try to use the product for something adjacent but more painful. They ask a question the interface should have answered. They create a workaround that reveals the real job to be done.</p>

<p>That is where the useful information is.</p>

<p>Not in the abstract version of the product. In the moments where the product meets the user’s actual context.</p>

<p>This is why shipping early matters.</p>

<p>Not because quality does not matter.</p>

<p>Quality matters a lot.</p>

<p>But the first version is not supposed to prove you were right. It is supposed to show you what is true.</p>

<p>A rough product in front of real users will teach you things another week of internal debate cannot.</p>

<h2 id="feedback-changes-the-target">Feedback changes the target</h2>

<p>A common mistake is treating feedback as something that improves the current plan.</p>

<p>Sometimes it does.</p>

<p>More often, feedback changes the plan.</p>

<p>The team may think the problem is speed. The user may care more about confidence. The team may think the key feature is automation. The user may need better control. The team may think the workflow starts with a blank input. The user may actually start from an artifact, a document, a spreadsheet, a ticket, or a conversation already in progress.</p>

<p>Those differences matter because they change what the product should become.</p>

<p>A good feedback loop does not just answer:</p>

<blockquote>
  <p>How do we make this version better?</p>
</blockquote>

<p>It also answers:</p>

<blockquote>
  <p>Are we solving the right problem in the right shape?</p>
</blockquote>

<p>That is a different question.</p>

<p>The first question creates polish.</p>

<p>The second creates product judgment.</p>

<h2 id="watch-what-users-do">Watch what users do</h2>

<p>Users are often better at revealing problems than describing solutions.</p>

<p>That is not a criticism. It is just how product discovery works.</p>

<p>A user may ask for a button, but the deeper signal is the repeated action behind the request. They may ask for an export, but the real issue is trust, review, or handoff. They may say the system is confusing, but the important detail is the exact point where their mental model broke.</p>

<p>Watching the user matters because behavior carries information that summaries miss.</p>

<p>You find out:</p>

<ul>
  <li>where they hesitate</li>
  <li>what they skip</li>
  <li>what they repeat</li>
  <li>what they misunderstand</li>
  <li>what they try before asking for help</li>
  <li>what they expected the product to know</li>
  <li>what work they still do outside the product</li>
  <li>what would make the next attempt easier</li>
</ul>

<p>Those are not cosmetic details.</p>

<p>They are instructions from reality.</p>

<h2 id="ship-small-enough-to-learn">Ship small enough to learn</h2>

<p>The goal is not to ship something careless.</p>

<p>The goal is to ship something small enough to learn from.</p>

<p>A useful early version should be narrow, honest, and observable.</p>

<p>Narrow, because a broad product creates too many ambiguous signals. If everything is possible, it becomes harder to tell what actually matters.</p>

<p>Honest, because the product should not pretend to be more mature than it is. Early users can tolerate rough edges if the value is clear and the boundary is explicit.</p>

<p>Observable, because the team needs to understand what happened. Where did users slow down? What did they try? What failed silently? What surprised them? What did they ask for after the first attempt?</p>

<p>The point of the first version is not completeness.</p>

<p>The point is contact.</p>

<p>You want the smallest useful product that can survive a real interaction and produce real signal.</p>

<h2 id="the-ai-product-version">The AI product version</h2>

<p>This matters even more for AI products.</p>

<p>A model can be capable and the product can still be wrong.</p>

<p>The model may answer correctly, but the answer may arrive at the wrong point in the workflow. It may produce a useful summary, but not the evidence a reviewer needs. It may automate a task, but remove the control that makes the user comfortable trusting the result. It may make a demo look impressive while still failing the messy path where real work happens.</p>

<p>The hard part is not only whether the system can produce an answer.</p>

<p>The hard part is whether it helps a real person make progress.</p>

<p>That means the product has to learn from contact with real users:</p>

<ul>
  <li>what context they actually have</li>
  <li>what uncertainty they need surfaced</li>
  <li>what evidence they trust</li>
  <li>what decisions they are trying to make</li>
  <li>what failure modes they can tolerate</li>
  <li>what parts of the workflow should stay human</li>
  <li>what output is useful enough to change the next action</li>
</ul>

<p>Those details rarely become obvious from a whiteboard.</p>

<p>They become obvious when someone tries to use the system and gets stuck.</p>

<h2 id="capability-is-not-the-same-as-fit">Capability is not the same as fit</h2>

<p>A lot of AI products start from the question:</p>

<blockquote>
  <p>Can the model do this?</p>
</blockquote>

<p>That question matters, but it is incomplete.</p>

<p>The better product question is:</p>

<blockquote>
  <p>Can this system fit into the user’s workflow well enough to change what happens next?</p>
</blockquote>

<p>That is where many products break.</p>

<p>The capability exists, but the workflow fit is weak. The output is impressive, but hard to inspect. The interface is flexible, but does not guide the user toward the right action. The system works in the happy path, but does not expose uncertainty, missing context, or failure clearly enough for real use.</p>

<p>In high-trust workflows, this gap becomes even more important.</p>

<p>The product has to preserve human judgment. It has to make evidence visible. It has to help the user understand what changed, what is missing, and what should be reviewed.</p>

<p>That kind of fit is hard to design from a distance.</p>

<p>You have to watch the work happen.</p>

<h2 id="optimize-after-reality-responds">Optimize after reality responds</h2>

<p>There is a time to optimize.</p>

<p>Once the team knows what matters, optimization becomes powerful. Speed matters when you know which step blocks the workflow. Interface polish matters when you know which decision the user is trying to make. Automation matters when you know which repeated action is actually painful. Reliability matters when you know which failures users cannot tolerate.</p>

<p>Before that, optimization can become a way to avoid learning.</p>

<p>It can keep the team busy while delaying the moment of truth.</p>

<p>The better sequence is simple:</p>

<ol>
  <li>Form a hypothesis.</li>
  <li>Ship the smallest useful version.</li>
  <li>Watch real users interact with it.</li>
  <li>Identify what reality made obvious.</li>
  <li>Optimize against that.</li>
</ol>

<p>The product improves because the feedback loop is tight.</p>

<p>The team is no longer guessing from inside the building. It is learning from contact with the environment the product is supposed to serve.</p>

<h2 id="the-broader-lesson">The broader lesson</h2>

<p>Product judgment is not just having strong opinions.</p>

<p>It is knowing how quickly to let reality update those opinions.</p>

<p>That requires shipping before every detail feels settled. It requires watching users without defending the product in your head. It requires treating confusion, hesitation, and workarounds as signal instead of embarrassment.</p>

<p>Do not over-optimize before you know what matters.</p>

<p>Ship the smallest useful version.</p>

<p>Watch what users actually do.</p>

<p>Then optimize against the real world.</p>]]></content><author><name>Ethan Nguyen</name><email>contact@ethannguyen.anonaddy.com</email></author><category term="Product" /><category term="AI Products" /><category term="Product Judgment" /><category term="User Feedback" /><summary type="html"><![CDATA[Do not over-optimize before you know what matters. Ship the smallest useful version, watch real users, and let reality tell you where to improve.]]></summary></entry><entry><title type="html">FieldDesk: Agentic Workflow Infrastructure for Administrative Readiness</title><link href="https://ethanhn.com/posts/2026/04/fielddesk-administrative-readiness/" rel="alternate" type="text/html" title="FieldDesk: Agentic Workflow Infrastructure for Administrative Readiness" /><published>2026-04-30T00:00:00-07:00</published><updated>2026-04-30T00:00:00-07:00</updated><id>https://ethanhn.com/posts/2026/04/fielddesk-administrative-readiness</id><content type="html" xml:base="https://ethanhn.com/posts/2026/04/fielddesk-administrative-readiness/"><![CDATA[<p>At the SCSP 2026 National Security Technology Hackathon in Washington, D.C., I built FieldDesk, a prototype for military administrative readiness. FieldDesk won the DC first-phase GenAI.mil track qualifier, which advanced the project from the local first round.</p>

<p>The useful part was not the award. It was the technical question the prototype forced:</p>

<blockquote>
  <p>What should exist around a model before it can safely help with high-trust administrative work?</p>
</blockquote>

<p>FieldDesk is my answer to that question in one workflow.</p>

<p>It is not a regulation chatbot. It is an agentic workflow system that takes mission intent, searches controlled sources, maps evidence to requirements, catches gaps, runs deterministic checks, and produces a review-ready action package for a human.</p>

<p>The product surface is military administration. The technical pattern is broader:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>controlled sources
  -&gt; tool-using agent
  -&gt; structured extraction
  -&gt; deterministic verification
  -&gt; source-backed workflow state
  -&gt; human-review package
</code></pre></div></div>

<p>That is the layer around the model.</p>

<h2 id="the-hackathon-context">The hackathon context</h2>

<p>The event was SCSP’s 2026 National Security Technology Hackathon, hosted as part of the AI+ Expo ecosystem. SCSP describes the hackathon as bringing together AI engineers from academia, research institutions, the private sector, and government to build applications for national competitiveness and national security.</p>

<p>The first phase took place across San Francisco, Boston, and Washington, D.C. Teams built prototypes in tracks including Cloud Laboratories, Electric Grid Optimization, Wargaming, and GenAI.mil. FieldDesk won the GenAI.mil track for the Washington, D.C. first phase. The GenAI.mil track focused on AI tools for military administrative work, logistics, and tactical knowledge retrieval for rank-and-file users.</p>

<p>That framing was useful because it constrained the build. The prototype had to target a concrete user, a concrete workflow, and a concrete failure mode.</p>

<h2 id="the-workflow">The workflow</h2>

<p>The first workflow is TDY travel readiness.</p>

<p>A user enters mission intent:</p>

<blockquote>
  <p>Send 10 soldiers to Demo Training Site for training from June 10-14. Lodging and rental vehicles required.</p>
</blockquote>

<p>FieldDesk turns that sentence into structured workflow state:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>workflow: TDY travel
purpose: training
destination: Demo Training Site
dates: June 10-14
travelers: 10
lodging: required
rental vehicles: required
</code></pre></div></div>

<p>Then the agent searches mocked sources that represent the systems a real workflow would touch:</p>

<ul>
  <li>coordination messages</li>
  <li>document repositories</li>
  <li>rosters</li>
  <li>training orders</li>
  <li>GSA-style rate data</li>
  <li>JTR-style policy excerpts</li>
  <li>local SOPs and unit checklists</li>
  <li>uploaded corrections</li>
  <li>DTS-style export fields</li>
</ul>

<p>The initial run intentionally contains problems:</p>

<ul>
  <li>the mission says 10 travelers, but the roster only shows 8</li>
  <li>no funding memo is present</li>
  <li>the rental vehicle justification is too generic for review</li>
</ul>

<p>FieldDesk surfaces those as review blockers before the package reaches a human reviewer. After the user stages corrections, the system recomputes readiness and generates a package with evidence map, reviewer objections, action list, packet summary, export rows, and source-backed trace.</p>

<p><img src="/images/fielddesk/fielddesk-final-package.png" alt="FieldDesk synthetic demo showing a TDY readiness workflow with source-backed evidence, deterministic per diem verification, and a review-ready packet" /></p>

<p><em>FieldDesk synthetic demo. The screenshot shows the final readiness package after corrections are staged: evidence map, reviewer notes, deterministic per diem verification, workflow progress, and export-oriented fields. All visible data is synthetic.</em></p>

<p>The core value moment is simple:</p>

<blockquote>
  <p>catch avoidable administrative failure before review.</p>
</blockquote>

<h2 id="the-architecture">The architecture</h2>

<p>The prototype is a TypeScript / Next.js application with a controlled agent runtime.</p>

<p>The important design choice is the boundary between semantic reasoning and deterministic verification.</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>intent
  -&gt; source selection
  -&gt; agent tool loop / synthesis
  -&gt; structured output schema
  -&gt; deterministic rules
  -&gt; validated workflow object
  -&gt; review package
</code></pre></div></div>

<p>The model owns semantic work:</p>

<ul>
  <li>interpreting mission intent</li>
  <li>choosing relevant source tools</li>
  <li>extracting facts from documents and messages</li>
  <li>identifying gaps and conflicts</li>
  <li>drafting reviewer objections and next actions</li>
</ul>

<p>The application owns system controls:</p>

<ul>
  <li>which sources are available</li>
  <li>which corrections have been staged</li>
  <li>which tools the agent can call</li>
  <li>schema validation</li>
  <li>per diem arithmetic</li>
  <li>audit trace construction</li>
  <li>final API contract</li>
</ul>

<p>This boundary matters because high-trust workflows should not put all correctness inside the model. The model can reason over messy context, but the platform should control what data exists, what tools can run, what math is trusted, what output shape is valid, and what a human can inspect.</p>

<p>A model response is not a workflow state.</p>

<p>A workflow state needs source references, typed fields, validation, deterministic checks, and a path to human review.</p>

<h2 id="tool-use-over-open-ended-chat">Tool use over open-ended chat</h2>

<p>The autonomous path uses fixture-backed tools instead of open-ended search.</p>

<p>That was intentional. In the public prototype, the tools are mocked. In a real environment, the same boundary would represent approved connectors into systems such as email, document stores, policy references, rate tables, and workflow systems.</p>

<p>The agent loop is bounded:</p>

<ol>
  <li>The model chooses a tool call.</li>
  <li>The API validates the tool arguments.</li>
  <li>The tool executes against controlled fixture data.</li>
  <li>The observation is appended to agent state.</li>
  <li>The loop continues until the model returns a final synthesis or reaches a step limit.</li>
  <li>The final object passes schema validation and deterministic checks.</li>
</ol>

<p>That is a different product shape from a blank chat box.</p>

<p>A blank chat box asks the user to know what to ask. FieldDesk asks the system to know what the workflow requires.</p>

<h2 id="deterministic-checks-outside-the-model">Deterministic checks outside the model</h2>

<p>Some parts of the workflow should not be delegated to the model.</p>

<p>Per diem arithmetic is one example. FieldDesk lets the model extract trip facts and reason over evidence, but the application verifies rate calculations using deterministic rules.</p>

<p>That pattern generalizes:</p>

<ul>
  <li>let the model interpret messy language</li>
  <li>keep arithmetic in code</li>
  <li>keep source availability explicit</li>
  <li>keep output schemas strict</li>
  <li>keep human-review artifacts inspectable</li>
</ul>

<p>The goal is not to distrust the model. The goal is to put the model in the right part of the system.</p>

<h2 id="scaling-across-the-department">Scaling across the Department</h2>

<p>The TDY workflow is only one instance of a broader pattern.</p>

<p>The scalable version of FieldDesk is not one central team hard-coding every administrative process. It is a workflow layer where units can create, improve, and share operational workflows with each other.</p>

<p>A battalion that has already encoded a good travel-readiness workflow should not force every other battalion to rediscover the same checklist, source mapping, reviewer objections, and export fields. The useful artifact is not just the completed packet. It is the reusable workflow:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>workflow template
  -&gt; required sources
  -&gt; extraction schema
  -&gt; deterministic checks
  -&gt; reviewer questions
  -&gt; export format
  -&gt; local unit adjustments
</code></pre></div></div>

<p>That makes the system compounding. One unit can build a workflow for TDY packets. Another can adapt it for equipment turn-in. Another can encode a recurring training-readiness process. Over time, this points toward a Department-wide library of source-backed workflows that preserve local context without making every unit start from a blank chat box.</p>

<p>The important governance boundary is that shared workflows should be inspectable before they are adopted. A unit should be able to see what sources a workflow expects, what fields it extracts, what checks it runs, what assumptions it makes, and where human approval remains required.</p>

<p>The end-state product thesis is not a single giant assistant. It is a shared workflow graph for administrative readiness: reusable enough to spread across units, local enough to adapt to mission context, and structured enough to keep evidence and accountability intact.</p>

<h2 id="evaluation">Evaluation</h2>

<p>FieldDesk includes a small evaluation harness around the demo workflow.</p>

<p>The evals use a golden scenario for the TDY case and check both exact fields and semantic quality:</p>

<ul>
  <li>exact checks for structured fields like dates, travelers, and estimated totals</li>
  <li>expected initial output before corrections</li>
  <li>expected corrected output after staged fixes</li>
  <li>tool-loop tests for source retrieval behavior</li>
  <li>deterministic-rule tests for calculation logic</li>
  <li>LLM-as-judge grading for groundedness and reasoning quality</li>
</ul>

<p>This is still a hackathon prototype, so the evals are small. But the principle matters.</p>

<p>For operational AI, evaluation has to move closer to the workflow. It is not enough to ask whether the model can answer a policy question. The system has to be evaluated on whether it helps a user produce a better package under realistic constraints:</p>

<ul>
  <li>Did it find the right sources?</li>
  <li>Did it expose missing evidence?</li>
  <li>Did it distinguish weak evidence from strong evidence?</li>
  <li>Did deterministic checks catch errors?</li>
  <li>Did the final package help a reviewer inspect the decision?</li>
</ul>

<p>Those are systems questions, not only model questions.</p>

<h2 id="why-this-matters">Why this matters</h2>

<p>Administrative work is easy to dismiss because it looks like paperwork.</p>

<p>But paperwork is often how organizations coordinate action. It decides whether people move, equipment arrives, travel happens, funding is available, permissions are granted, and plans become executable.</p>

<p>If that layer is slow, the organization is slow.</p>

<p>The right AI system should compress the evidence surface without hiding accountability. It should preserve sources, surface gaps, and help humans make better decisions faster.</p>

<p>That is why FieldDesk is not framed as “automating paperwork” in the simplistic sense. The technical goal is administrative readiness:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>less hunting
less rework
clearer gaps
typed workflow state
source-backed outputs
human-reviewable packages
</code></pre></div></div>

<h2 id="scope-and-safety">Scope and safety</h2>

<p>FieldDesk is a hackathon prototype, not a production military system.</p>

<p>The public repository uses synthetic demo data only. It does not include personal records, credentials, operational records, or government output. The connectors are mocked to show the workflow pattern without requiring access to production systems.</p>

<p>That constraint is intentional. The public demo is meant to make the product shape inspectable:</p>

<ul>
  <li>agent-led evidence gathering</li>
  <li>controlled tool use</li>
  <li>structured extraction</li>
  <li>gap detection</li>
  <li>correction staging</li>
  <li>deterministic verification</li>
  <li>reviewer-ready output</li>
</ul>

<p>The real work would be integrating the same patterns into approved environments, approved data sources, and accountable review processes.</p>

<h2 id="what-i-learned">What I learned</h2>

<p>The strongest AI products for government will probably not look like generic assistants.</p>

<p>They will look like workflow systems with models inside them.</p>

<p>The model will reason, but the product will define the task boundary. The product will decide what sources exist, what evidence counts, what output schema is valid, where uncertainty is shown, what a reviewer can inspect, and which actions require human approval.</p>

<p>That is the thread connecting FieldDesk, ATO Copilot, and the systems I keep building:</p>

<blockquote>
  <p>high-trust work needs source-backed AI, not just fluent AI.</p>
</blockquote>

<p>The bottleneck is shifting from access to model capability toward the ability to wrap that capability in systems that preserve evidence, judgment, and accountability.</p>

<p>FieldDesk is a small prototype pointed at that shift.</p>

<p>The prototype is public here: <a href="https://github.com/EthanHNguyen/FieldDesk">https://github.com/EthanHNguyen/FieldDesk</a></p>

<p>Sources: <a href="https://www.scsp.ai/hackathon/">SCSP Hackathon</a>, <a href="https://expo.scsp.ai/hackathon/">AI+ Expo Hackathon</a>, <a href="https://github.com/EthanHNguyen/FieldDesk">FieldDesk repository</a>.</p>]]></content><author><name>Ethan Nguyen</name><email>contact@ethannguyen.anonaddy.com</email></author><category term="AI for Government" /><category term="GovTech" /><category term="Operational Trust" /><category term="Agentic AI" /><category term="Hackathon" /><summary type="html"><![CDATA[FieldDesk is a source-backed AI prototype for military administrative readiness. It won the DC first-phase GenAI.mil track qualifier at the SCSP 2026 National Security Technology Hackathon.]]></summary></entry><entry><title type="html">A Model Is Not an Operational System</title><link href="https://ethanhn.com/posts/2026/04/a-model-is-not-an-operational-system/" rel="alternate" type="text/html" title="A Model Is Not an Operational System" /><published>2026-04-28T00:00:00-07:00</published><updated>2026-04-28T00:00:00-07:00</updated><id>https://ethanhn.com/posts/2026/04/a-model-is-not-an-operational-system</id><content type="html" xml:base="https://ethanhn.com/posts/2026/04/a-model-is-not-an-operational-system/"><![CDATA[<p>A model is not an operational system.</p>

<p>That sounds obvious, but it is easy to forget when the most visible progress in AI is measured through demos, benchmark scores, and model launches.</p>

<p>Those things matter. Capability matters. A stronger model expands the frontier of what is possible.</p>

<p>But capability by itself does not create operational value.</p>

<p>A model can answer a question once and still fail as a system. It can produce an impressive demo and still be hard to evaluate, recover, audit, or trust. It can be powerful in isolation and brittle inside a real workflow.</p>

<p>The interesting work is no longer just proving that frontier models are capable. That part is increasingly obvious.</p>

<p>The hard part is turning capability into systems people can depend on.</p>

<h2 id="the-missing-layer">The missing layer</h2>

<p>Between model capability and real-world value, there is a systems layer.</p>

<p>That layer includes:</p>

<ul>
  <li>infrastructure that can serve models reliably</li>
  <li>evaluation loops that catch behavior benchmarks miss</li>
  <li>interfaces that make uncertainty and provenance legible</li>
  <li>retrieval and data systems that ground outputs in context</li>
  <li>observability that shows when the system is failing</li>
  <li>recovery paths when the model, tool, or user workflow breaks</li>
  <li>trust mechanisms that let humans stay accountable</li>
</ul>

<p>This layer is where a model becomes useful.</p>

<p>It is also where many AI systems fail.</p>

<p>Not because the model is weak. Because the surrounding system is not designed for the environment it enters.</p>

<h2 id="capability-is-not-reliability">Capability is not reliability</h2>

<p>A frontier model can be impressive and unreliable at the same time.</p>

<p>It may solve a hard problem in one context and miss a simple constraint in another. It may produce a plausible answer without enough evidence. It may follow instructions correctly but fail to expose the assumptions behind its output.</p>

<p>In low-stakes settings, that may be acceptable. The user can retry, ignore the answer, or treat the model as a brainstorming partner.</p>

<p>In high-trust workflows, that is not enough.</p>

<p>A useful system should be able to answer:</p>

<ul>
  <li>Where did this claim come from?</li>
  <li>What evidence supports it?</li>
  <li>What changed since the last run?</li>
  <li>What is missing?</li>
  <li>What should a human review?</li>
  <li>What happens if the model is wrong?</li>
</ul>

<p>Those are systems questions, not just model questions.</p>

<h2 id="the-interface-matters">The interface matters</h2>

<p>A blank chat box is flexible, but it is often a weak interface for operational work.</p>

<p>It asks the user to know what to ask. It hides structure inside conversation. It makes repeatability and review harder than they need to be.</p>

<p>Many real workflows need the opposite.</p>

<p>They need systems that start from artifacts, constraints, evidence, and required outputs. They need interfaces that shape messy context into work products a human can inspect.</p>

<p>That is the difference between AI as an answer machine and AI as workflow infrastructure.</p>

<p>The right interface does not remove human judgment. It gives humans better instruments.</p>

<h2 id="evaluation-has-to-move-closer-to-the-workflow">Evaluation has to move closer to the workflow</h2>

<p>Benchmarks are useful, but they are not the same thing as operational evaluation.</p>

<p>A benchmark asks whether a model can perform a task under a defined test condition. A workflow asks whether the system can help a person complete real work under real constraints.</p>

<p>Those are related, but not identical.</p>

<p>Operational evaluation has to measure things like:</p>

<ul>
  <li>source grounding</li>
  <li>consistency across repeated runs</li>
  <li>failure visibility</li>
  <li>reviewer usefulness</li>
  <li>latency and cost under realistic usage</li>
  <li>behavior when context is incomplete</li>
  <li>handoff quality between model output and human decision</li>
</ul>

<p>The goal is not to replace benchmarks. The goal is to connect them to the conditions under which the system will actually be used.</p>

<h2 id="infrastructure-shapes-behavior">Infrastructure shapes behavior</h2>

<p>The serving layer is not neutral.</p>

<p>Latency changes how people use a system. Cost changes which workflows are feasible. Context windows shape what can be grounded. Quantization and routing decisions affect quality. Observability determines whether failures are noticed or silently absorbed.</p>

<p>The model matters, but so does everything around it:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>model capability
  -&gt; serving infrastructure
  -&gt; retrieval and tools
  -&gt; interface
  -&gt; evaluation
  -&gt; human workflow
  -&gt; operational trust
</code></pre></div></div>

<p>If any layer is weak, the value of the model is constrained by the system around it.</p>

<p>The teams that get value from AI will be the ones that build below the demo layer: model serving, telemetry, evaluation, provenance, interfaces, and recovery paths.</p>

<p>That is where capability becomes something an organization can use.</p>

<h2 id="what-operational-ai-should-make-visible">What operational AI should make visible</h2>

<p>Useful AI systems should make the surrounding work visible, not hide it behind a fluent answer.</p>

<p>A source-backed workflow should preserve provenance, surface reviewer questions, and keep humans accountable.</p>

<p>A public-data workflow should make messy information easier to explore, cluster, and reason over.</p>

<p>An infrastructure workflow should expose serving constraints, telemetry, quality tradeoffs, and failure recovery paths.</p>

<p>The common thread is operational translation.</p>

<p>The question for any AI project should be:</p>

<blockquote>
  <p>What has to exist around the model before the model becomes useful?</p>
</blockquote>

<h2 id="the-broader-shift">The broader shift</h2>

<p>As models get stronger, the bottleneck shifts.</p>

<p>It shifts from capability to reliability.</p>

<p>From demos to deployment paths.</p>

<p>From answers to evidence.</p>

<p>From prompts to interfaces.</p>

<p>From benchmark performance to operational trust.</p>

<p>The next generation of AI work will not be won by models alone. It will be won by the people and teams who can turn model capability into systems that survive contact with real constraints.</p>

<h2 id="build-the-layer-around-the-model">Build the layer around the model</h2>

<p>If you are bringing AI into a real workflow, do not stop at the model choice.</p>

<p>Map the system around it:</p>

<ul>
  <li>What evidence does the system need?</li>
  <li>How will outputs be evaluated?</li>
  <li>Where will uncertainty be shown?</li>
  <li>What can a reviewer inspect?</li>
  <li>How will failures be detected and recovered?</li>
  <li>What should be logged for audit and improvement?</li>
</ul>

<p>The call to action is simple: treat the model as one component in an operational system. Design the workflow, evidence, evaluation, interface, and recovery paths with the same seriousness as the model itself.</p>

<p>That is how AI moves from impressive output to dependable work.</p>]]></content><author><name>Ethan Nguyen</name><email>contact@ethannguyen.anonaddy.com</email></author><category term="Frontier AI" /><category term="AI Infrastructure" /><category term="ML Systems" /><category term="Operational Trust" /><summary type="html"><![CDATA[Frontier model capability is only the beginning. The hard part is turning that capability into reliable, observable, source-backed systems that people can actually use and trust.]]></summary></entry><entry><title type="html">ATO Copilot and the Compliance Gap in Agentic Software</title><link href="https://ethanhn.com/posts/2026/04/ato-copilot-source-backed-ai/" rel="alternate" type="text/html" title="ATO Copilot and the Compliance Gap in Agentic Software" /><published>2026-04-27T00:00:00-07:00</published><updated>2026-04-27T00:00:00-07:00</updated><id>https://ethanhn.com/posts/2026/04/ato-copilot-source-backed-ai</id><content type="html" xml:base="https://ethanhn.com/posts/2026/04/ato-copilot-source-backed-ai/"><![CDATA[<p>At c0mpiled-10/DC: AI for Government, I built ATO Copilot, a four-hour hackathon prototype for source-backed authorization support. It won 1st place, but the more interesting part was the question it explored.</p>

<blockquote>
  <p>Can AI help government teams move through authorization and compliance workflows by grounding every recommendation in source evidence?</p>
</blockquote>

<p>ATO is one instance of a broader pattern: high-trust workflows where messy evidence has to become structured reasoning before a human can make an accountable decision.</p>

<p>That question matters more now because software development is accelerating faster than compliance is.</p>

<p>Agentic coding tools are changing the rate at which software can be produced. A small team can now generate features, tests, infrastructure changes, documentation, and pull requests at a tempo that would have looked unrealistic a few years ago. The bottleneck is shifting away from writing code and toward proving that the resulting system can be trusted.</p>

<p>In high-trust environments, that proof burden does not disappear. It compounds.</p>

<p>Every system change creates questions:</p>

<ul>
  <li>What changed?</li>
  <li>Which controls are affected?</li>
  <li>What evidence supports the claim?</li>
  <li>What gaps remain?</li>
  <li>What would a reviewer ask?</li>
  <li>Which risks are acceptable, and who accepted them?</li>
</ul>

<p>Human-driven compliance workflows were already strained when software moved at human speed. They become structurally mismatched when software starts moving at agentic speed.</p>

<p>The issue is not that humans are unnecessary. It is that human review capacity becomes the scarce resource. If AI can generate more software, more configuration, more infrastructure, and more documentation, then organizations also need systems that can generate more traceability, more evidence mapping, more control awareness, and more reviewer-ready context.</p>

<p>Otherwise, the future looks like this:</p>

<ol>
  <li>Agentic engineering increases delivery velocity.</li>
  <li>Compliance teams receive more artifacts, more diffs, and more systems to review.</li>
  <li>Manual evidence collection and control mapping become the bottleneck.</li>
  <li>Authorization queues get longer.</li>
  <li>Teams route around the process or slow down to survive it.</li>
</ol>

<p>Neither outcome is acceptable.</p>

<p>The right answer is not to remove humans from authorization decisions. The right answer is to give humans better instruments.</p>

<h2 id="what-ato-copilot-does">What ATO Copilot does</h2>

<p>ATO Copilot is a prototype for source-backed authorization support.</p>

<p>The demo ingests synthetic evidence artifacts and produces structured control analysis. It focuses on a few concrete tasks:</p>

<ul>
  <li>mapping evidence to relevant control families</li>
  <li>surfacing likely reviewer questions</li>
  <li>identifying gaps or weak claims</li>
  <li>recommending next actions</li>
  <li>keeping outputs traceable to source material</li>
</ul>

<p>The important design choice is that the system is not a generic chatbot.</p>

<p>For example, if an evidence artifact claims that audit logging is enabled, a useful system should not simply repeat that claim. It should map the claim to the relevant control area, quote the exact source text, identify whether retention or access review evidence is missing, and suggest what a reviewer should inspect next.</p>

<p>A blank chat interface asks the user to know what to ask. A compliance workflow needs the opposite: it should shape messy evidence into reviewer-ready work products. The interaction should start with artifacts and produce structured analysis, not start with a prompt box and hope the user interrogates the system correctly.</p>

<p>That is the difference between AI as an answer machine and AI as workflow infrastructure.</p>

<p><img src="/images/ato-copilot-package-scan-complete.png" alt="ATO Copilot demo showing source-backed control analysis over synthetic evidence" /></p>

<p><em>ATO Copilot demo interface using synthetic evidence. The screenshot shows source-backed control analysis, reviewer questions, recommended actions, and provenance traces.</em></p>

<h2 id="why-this-matters">Why this matters</h2>

<p>Authorization, compliance, audit, procurement, grants, and program review workflows all share the same core pattern:</p>

<blockquote>
  <p>messy evidence → structured reasoning → accountable decision</p>
</blockquote>

<p>Frontier models are good at language and synthesis, but high-trust environments need more than plausible summaries. They need systems that can explain where claims came from, what evidence supports them, what remains uncertain, and what a responsible human should inspect next.</p>

<p>This becomes more urgent as agentic coding changes the economics of software production.</p>

<p>If one engineer can produce ten times more system changes, the review surface expands. If multiple agents can modify code, infrastructure, tests, and documentation in parallel, the evidence surface expands again. The compliance function cannot scale by asking humans to read every artifact manually at the same depth and speed.</p>

<p>That does not mean compliance should become fully autonomous. It means compliance tooling needs to become evidence-native.</p>

<p>A useful system should be able to answer:</p>

<ul>
  <li>Here is the evidence we found.</li>
  <li>Here is the control or requirement it appears to support.</li>
  <li>Here is the exact source text behind that mapping.</li>
  <li>Here is the part that looks weak.</li>
  <li>Here is the likely reviewer objection.</li>
  <li>Here is the next action a human should take.</li>
</ul>

<p>This is the kind of work AI should do in government and regulated environments: compress the evidence surface, preserve traceability, and raise the quality of human judgment.</p>

<h2 id="the-product-thesis">The product thesis</h2>

<p>Most government AI demos begin as chat over documents. That can be useful, but it is not enough for high-trust workflows.</p>

<p>The stronger pattern is workflow-native AI:</p>

<ul>
  <li>source-backed</li>
  <li>auditable</li>
  <li>constrained</li>
  <li>artifact-driven</li>
  <li>embedded in real approval paths</li>
  <li>designed around trust, not just speed</li>
</ul>

<p>ATO Copilot is an early exploration of that pattern.</p>

<p>The prototype does not claim to automate authorization decisions. It does not contain CUI, customer data, or official assessment output. All demo evidence is synthetic.</p>

<p>The goal is narrower and more useful: show how AI can turn evidence artifacts into traceable control analysis and reviewer-ready next steps.</p>

<h2 id="the-broader-shift">The broader shift</h2>

<p>As software becomes more agentic, governance has to become more computational.</p>

<p>Not bureaucratic. Not performative. Computational.</p>

<p>That means policies, controls, evidence, approvals, and risks need to be represented in ways machines can help reason over while humans remain accountable for judgment.</p>

<p>The teams that figure this out will move faster without lowering trust. The teams that do not will face a growing mismatch between how quickly systems can be built and how slowly they can be approved.</p>

<p>That is the gap ATO Copilot is pointed at.</p>

<p>The prototype is public here: <a href="https://github.com/EthanHNguyen/ato-copilot">https://github.com/EthanHNguyen/ato-copilot</a></p>

<p>It uses synthetic evidence only and is intended as a product/architecture sketch, not an operational authorization system.</p>]]></content><author><name>Ethan Nguyen</name><email>contact@ethannguyen.anonaddy.com</email></author><category term="AI for Government" /><category term="GovTech" /><category term="Operational Trust" /><category term="Compliance" /><category term="Agentic Coding" /><summary type="html"><![CDATA[ATO Copilot is a source-backed AI prototype exploring evidence, provenance, reviewer questions, and operational trust in government authorization workflows.]]></summary></entry><entry><title type="html">Frontier AI, Engineered for the Real World</title><link href="https://ethanhn.com/posts/2026/04/frontier-ai-engineered-real-world/" rel="alternate" type="text/html" title="Frontier AI, Engineered for the Real World" /><published>2026-04-27T00:00:00-07:00</published><updated>2026-04-27T00:00:00-07:00</updated><id>https://ethanhn.com/posts/2026/04/frontier-ai-engineered-real-world</id><content type="html" xml:base="https://ethanhn.com/posts/2026/04/frontier-ai-engineered-real-world/"><![CDATA[<p>Frontier AI is creating a new kind of engineering problem.</p>

<p>It is not just model research. It is not just app development. It is not just infrastructure.</p>

<p>The hard work sits between those layers.</p>

<p>The model may be capable. The product may have a real user need. The infrastructure may run. The interface may be polished. But if those pieces are designed separately, the system can still fail when it reaches a real workflow.</p>

<p>Frontier AI engineering is the discipline of connecting them.</p>

<h2 id="the-work-is-cross-layer">The work is cross-layer</h2>

<p>Many AI projects break because each layer makes locally reasonable decisions that do not compose.</p>

<p>A model team optimizes capability. An infrastructure team optimizes latency and cost. A product team optimizes the user flow. A security team limits exposure. An evaluator measures task performance. Each one may be right in isolation.</p>

<p>The system only works when someone can reason across the whole stack:</p>

<ul>
  <li>model behavior and failure modes</li>
  <li>retrieval, context, and data quality</li>
  <li>serving constraints, latency, and cost</li>
  <li>evaluation tied to workflow outcomes</li>
  <li>interface design that exposes uncertainty</li>
  <li>security, privacy, and governance boundaries</li>
  <li>human review, override, and recovery paths</li>
</ul>

<p>That is the engineering gap frontier AI exposes.</p>

<h2 id="product-judgment-matters-more-not-less">Product judgment matters more, not less</h2>

<p>As models get stronger, it is tempting to treat product design as a thin wrapper around intelligence.</p>

<p>That is backwards.</p>

<p>Stronger models make product judgment more important because the system can now do more consequential work. The product has to decide what the model should do, what it should not do, what evidence it should show, and where human judgment should remain in control.</p>

<p>A generic chat interface can demonstrate capability. It rarely defines a dependable workflow.</p>

<p>Real-world AI products need sharper answers:</p>

<ul>
  <li>What is the user’s actual decision or work product?</li>
  <li>What context is required before the model acts?</li>
  <li>Which outputs need citations, confidence signals, or review states?</li>
  <li>What failure should be obvious instead of hidden?</li>
  <li>Where should the system ask for human input instead of guessing?</li>
</ul>

<p>These are product questions, but they are inseparable from engineering.</p>

<h2 id="infrastructure-changes-the-product">Infrastructure changes the product</h2>

<p>The serving layer is not invisible plumbing.</p>

<p>Latency changes interaction design. Cost changes what can run continuously. Context limits change what can be grounded. Model routing changes quality. Logging changes evaluation. Security controls change what data can be used.</p>

<p>Every infrastructure decision becomes a product decision once users depend on the system.</p>

<p>That is why frontier AI engineering requires fluency in both directions. Engineers need to understand how infrastructure constraints shape user experience. Product teams need to understand how model and platform constraints shape what can be promised.</p>

<h2 id="evaluation-has-to-become-operational">Evaluation has to become operational</h2>

<p>Offline benchmarks are useful, but they are not enough to run a real AI workflow.</p>

<p>Evaluation is the discipline that turns model behavior from something observed into something managed. It is also one of the foundations of trust. Without evaluation, teams are left guessing whether a system is improving, regressing, or quietly failing in the parts of the workflow that matter most.</p>

<p>Operational evaluation asks a different set of questions:</p>

<ul>
  <li>Did the system preserve the evidence a reviewer needs?</li>
  <li>Did it fail loudly when context was missing?</li>
  <li>Did repeated runs produce stable enough outputs?</li>
  <li>Did the interface help the user decide what to trust?</li>
  <li>Did reviewers know when to rely on the system and when to challenge it?</li>
  <li>Did latency, cost, or routing change behavior in production?</li>
  <li>Did the handoff from model output to human action actually work?</li>
</ul>

<p>This is where model evaluation becomes systems evaluation.</p>

<h2 id="build-for-the-handoff-from-model-to-workflow">Build for the handoff from model to workflow</h2>

<p>The frontier is no longer only the model. It is the handoff from model capability to real work.</p>

<p>Teams applying frontier AI should build the discipline to ask cross-layer questions early:</p>

<ul>
  <li>What does the user need to decide or produce?</li>
  <li>What model behavior is reliable enough for that workflow?</li>
  <li>What evidence, context, and provenance must be preserved?</li>
  <li>What infrastructure constraints will shape the experience?</li>
  <li>What evaluation loop will catch real failures?</li>
  <li>What should humans inspect, override, or audit?</li>
  <li>What recovery path exists when the system is wrong?</li>
</ul>

<p>Build AI teams and systems that can reason across the full stack. The advantage will belong to the people who can translate frontier capability into workflows that survive production constraints, human judgment, evaluation, and organizational trust.</p>]]></content><author><name>Ethan Nguyen</name><email>contact@ethannguyen.anonaddy.com</email></author><category term="AI Infrastructure" /><category term="ML Systems" /><category term="Frontier AI" /><category term="Operational Trust" /><summary type="html"><![CDATA[Frontier AI engineering is becoming its own discipline: translating model capability into reliable workflows through infrastructure, evaluation, product judgment, and trust design.]]></summary></entry><entry><title type="html">Why Google?</title><link href="https://ethanhn.com/posts/2025/10/why-google/" rel="alternate" type="text/html" title="Why Google?" /><published>2025-10-01T00:00:00-07:00</published><updated>2025-10-01T00:00:00-07:00</updated><id>https://ethanhn.com/posts/2025/10/why-google</id><content type="html" xml:base="https://ethanhn.com/posts/2025/10/why-google/"><![CDATA[<p>When I decided to join Google, the question I kept coming back to was simple:</p>

<blockquote>
  <p>Where can I learn the most about turning frontier AI capability into systems people can actually use?</p>
</blockquote>

<p>There are many good reasons to work at a large technology company. Scale, brand, compensation, and access all matter. But those were not the main reasons for me.</p>

<p>The main reason was proximity to the full system.</p>

<h2 id="the-full-stack-of-ai-value">The full stack of AI value</h2>

<p>A model is not a product by itself.</p>

<p>A benchmark score is not a workflow.</p>

<p>A demo is not an operational system.</p>

<p>The interesting work happens in the translation layer between model capability and real-world value. That layer includes:</p>

<ul>
  <li>infrastructure that can serve advanced models reliably</li>
  <li>evaluation systems that reveal behavior benchmarks miss</li>
  <li>interfaces that help people use AI without hiding uncertainty</li>
  <li>data and retrieval systems that ground outputs in real context</li>
  <li>reliability mechanisms that make failure visible and recoverable</li>
  <li>trust layers that let organizations adopt AI responsibly</li>
</ul>

<p>That is the layer I wanted to understand more deeply.</p>

<p>Google is one of the few places where the whole stack exists at serious scale: research, models, infrastructure, products, cloud, security, developer platforms, and users with real operational constraints.</p>

<p>That matters because the bottleneck in AI is shifting.</p>

<p>It is no longer enough to ask whether a model is powerful. The better question is whether that capability can survive contact with production systems, organizational workflows, latency budgets, security requirements, and human judgment.</p>

<h2 id="why-scale-matters">Why scale matters</h2>

<p>Scale is easy to talk about and hard to internalize.</p>

<p>At small scale, many problems look like product problems. At large scale, they become systems problems.</p>

<p>Reliability becomes a design constraint. Evaluation becomes continuous. Interfaces need to preserve context. Infrastructure choices show up as user experience. Small failure modes compound across millions or billions of interactions.</p>

<p>That is why I was drawn to Google.</p>

<p>The company operates at a level where AI is not just an experiment. It has to become infrastructure.</p>

<p>This does not make every problem glamorous. In fact, some of the most important work is not glamorous at all. It is debugging, hardening, measuring, simplifying, integrating, and making systems legible enough for people to trust.</p>

<p>But that is precisely the work I wanted to get closer to.</p>

<h2 id="the-kind-of-engineer-i-want-to-become">The kind of engineer I want to become</h2>

<p>I do not want to be an engineer who only understands models in isolation.</p>

<p>I also do not want to be an engineer who only knows how to wrap a model in an application.</p>

<p>The goal is to become the kind of builder who can bridge model capability, deployment reality, and product judgment.</p>

<p>That means learning how frontier AI systems behave under real constraints:</p>

<ul>
  <li>What breaks when a model leaves a benchmark?</li>
  <li>What does reliability mean when outputs are probabilistic?</li>
  <li>How should humans stay in the loop without becoming bottlenecks?</li>
  <li>What should be measured before a system is trusted?</li>
  <li>What interfaces make uncertainty visible instead of burying it?</li>
</ul>

<p>These are not abstract questions. They show up in real systems, with real users, and real consequences.</p>

<h2 id="why-google-specifically">Why Google, specifically</h2>

<p>Google sits at a rare intersection:</p>

<ul>
  <li>deep AI research</li>
  <li>world-class infrastructure</li>
  <li>massive production systems</li>
  <li>cloud and enterprise deployment</li>
  <li>security and reliability culture</li>
  <li>products used by real people every day</li>
</ul>

<p>That intersection is valuable because it forces a builder to think across layers.</p>

<p>It is not enough to care about capability. You have to care about the path from capability to usefulness.</p>

<p>It is not enough to care about shipping. You have to care about what happens after the system meets reality.</p>

<p>It is not enough to care about scale. You have to care about whether the system remains understandable, reliable, and trusted as it scales.</p>

<p>That is the environment I wanted to learn from.</p>

<h2 id="the-broader-thesis">The broader thesis</h2>

<p>The next phase of AI will not be defined only by who has the strongest model.</p>

<p>It will be defined by who can turn model capability into systems people can depend on.</p>

<p>That requires infrastructure, evaluation, reliability, interfaces, and trust.</p>

<p>That is why Google made sense to me.</p>

<p>Not because it is a destination, but because it is one of the best places to study the problem I care about most:</p>

<blockquote>
  <p>How do frontier AI systems become useful in the real world?</p>
</blockquote>]]></content><author><name>Ethan Nguyen</name><email>contact@ethannguyen.anonaddy.com</email></author><category term="Career" /><category term="AI Infrastructure" /><category term="ML Systems" /><category term="Frontier AI" /><summary type="html"><![CDATA[Why Ethan Nguyen joined Google: proximity to the full stack of AI value, from frontier model capability to infrastructure, evaluation, reliability, interfaces, and trust.]]></summary></entry><entry><title type="html">All models are wrong, but some are useful</title><link href="https://ethanhn.com/posts/2024/10/23/models/" rel="alternate" type="text/html" title="All models are wrong, but some are useful" /><published>2024-10-23T00:00:00-07:00</published><updated>2024-10-23T00:00:00-07:00</updated><id>https://ethanhn.com/posts/2024/10/23/all-models-are-wrong</id><content type="html" xml:base="https://ethanhn.com/posts/2024/10/23/models/"><![CDATA[<blockquote>
  <p>All models are wrong, but some are useful. [1]</p>
</blockquote>

<p>I heard this quote today during a wonderful workshop on CyberSecurity Essentials by one of my colleagues. This aphorism succintly describes how I think about my own knowledge. What I know is a model of the world. Be humble. Assume knowledge is wrong but useful. Seek to learn everyday.</p>

<p>[1] <a href="https://en.wikipedia.org/wiki/All_models_are_wrong">https://en.wikipedia.org/wiki/All_models_are_wrong</a></p>]]></content><author><name>Ethan Nguyen</name><email>contact@ethannguyen.anonaddy.com</email></author><summary type="html"><![CDATA[All models are wrong, but some are useful. [1]]]></summary></entry><entry><title type="html">Happy Hours with a Mentor</title><link href="https://ethanhn.com/posts/2024/10/23/happy-hours" rel="alternate" type="text/html" title="Happy Hours with a Mentor" /><published>2024-10-23T00:00:00-07:00</published><updated>2024-10-23T00:00:00-07:00</updated><id>https://ethanhn.com/posts/2024/10/23/happy-hours</id><content type="html" xml:base="https://ethanhn.com/posts/2024/10/23/happy-hours"><![CDATA[<p>One of my mentors is a Director of Software Engineering. We recently met during happy hours. I asked him, “What advice would you share with your kids?”</p>

<p>He encourages his kids to master 3 skills:</p>

<ol>
  <li>Communication</li>
  <li>Time management</li>
  <li>Networking</li>
</ol>

<p>As I continue in my career, I agree more with his advice.</p>]]></content><author><name>Ethan Nguyen</name><email>contact@ethannguyen.anonaddy.com</email></author><summary type="html"><![CDATA[One of my mentors is a Director of Software Engineering. We recently met during happy hours. I asked him, “What advice would you share with your kids?”]]></summary></entry><entry><title type="html">Book Summary &amp;amp; Reflection - The Anatomy of the Swipe: Making Money Move</title><link href="https://ethanhn.com/posts/2023/12/20/book-summary-swipe/" rel="alternate" type="text/html" title="Book Summary &amp;amp; Reflection - The Anatomy of the Swipe: Making Money Move" /><published>2023-12-20T00:00:00-08:00</published><updated>2023-12-20T00:00:00-08:00</updated><id>https://ethanhn.com/posts/2023/12/20/book-summary</id><content type="html" xml:base="https://ethanhn.com/posts/2023/12/20/book-summary-swipe/"><![CDATA[<p>Ever since I got my first credit card, I have been interested in better understanding how card payments work. There are ~276 million card transactions handled everyday [1]. How are these transactions handled? What is the busienss model of the incumbent businesses? What are the major pain points of the current payments system? How would you create a new credit card offering?</p>

<p>A fellow engineer at Capital One recommended I read “The Anatomy of the Swipe: Making Money Move” by Ahmed Siddiqui to better understand the payments industry. Here, I’ll summarize my key learnings and reflect on what it means for technologists in finance.</p>

<h2 id="overview">Overview</h2>

<p>There are four parts required to facilitate a card payment:</p>
<ul>
  <li>A card (physical, virtual, token)</li>
  <li>Merchant</li>
  <li>Payment Network</li>
  <li>Secure internet connection to transmit messages.</li>
</ul>

<p>When a customer goes to pay at a merchant, a request to approve the transaction is sent in this basic (but not exhaustive) flow: Merchant -&gt; Merchant Acquirer -&gt; Acquirer Processor -&gt; Card Network -&gt; Issuer Processor -&gt; Issuer.</p>

<p>Definition of Key Players:</p>
<ul>
  <li>Merchant - business selling the good or service. If they’re physical, they need a machine to read the card. If they’re virtual, they need a payment gateway. Examples of Merchants: Walmart, Costco, and Amazon.</li>
  <li>Merchant Acquirer - Partners with Merchants and provides them the tools and facilities to accept and process card-based payments. Examples of Merchant Acquirer: Chase Paymentech and Global Payments.</li>
  <li>Acquirer Processor - Merchant acquirer’s technology partner to connect with payment network. Usually, acquirer processor will have the hardware to connect to the payment network to request approval of transaction. Examples of Acquirer Processors: Redsys, Monext, Elavon
    <ul>
      <li>Some merchant aquirers have built this in-house or may rely on a third-party.</li>
    </ul>
  </li>
  <li>Payment Network (aka Card Scheme) - Provide infrastructure for card-based transactions. Sit between Acquirers and Issuers and pass messages back and forth to enable transaction. Payment networks also set the communication rules and standards. Example of Payment Network: Visa, Mastercard, American Express, Discover</li>
  <li>Issuer Processor - Issuer’s technology parter to connect to the payment networks. This technology provider will usually have hardware in their data center and a fast network connection to the payment network to approve or decline transactions. Example of Issuer Processors: TSYS, Galileo, i2c.
    <ul>
      <li>Some Issuer’s have built this in-house or may rely on a third-party.</li>
      <li>Note: Capital One has partnered with TSYS to help process their credit cards.</li>
    </ul>
  </li>
  <li>Issuer - an Issuer or Issuing Bank’s purpose is to underwrite the user by granting them access to a bank account and potentially access to credit facilities. Ex: JPMorgan Chase, Capital One, Citi, and Wells Fargo.</li>
</ul>

<p>The issuer will then decide to whether to approve or deny the transaction. If approved, the issuer will place a hold on the funds. Later, at the end of the day, Merchant will confirm transactions and include tips, transaction reversals, and refunds. Finally, money is moved between customer and merchant (clearing).</p>

<h2 id="payment-ecosystem">Payment Ecosystem</h2>
<p>There are two types of card networks:</p>
<ul>
  <li>Open Networks (Visa, Mastercard) - There are multiple Merchant Acquirers and Issuers. Card network makes money through fees.
    <ul>
      <li>Pros: Good for distribution. Get brand into as many consumers and merchants as possible.</li>
      <li>Cons: Complex to coordinate players.</li>
    </ul>
  </li>
  <li>Closed Networks (American Express, Discover) - Take their own interchange, Acquirer, and Network Assesment fees. Can adjust fee based on Merchant size. Visa accounts for ~50% of all purchase volume. American Express accounts for 13% and Discover for 2%.
    <ul>
      <li>Pros: Revenue per swipe is higher.</li>
      <li>Cons: Total swipe volume is lower since the network is not as large.</li>
    </ul>
  </li>
  <li>For debit cards, each Card Network has a secondary network brand for Pin Debit or Automated Teller Machine (ATM).
    <ul>
      <li>PIN Debit Network. Visa - Interlink, Visa-Net Debit. Mastercard - Maestro. Discover - Pulse</li>
      <li>Durbin Amendment - every debit card must have a secondary unaffiliated network.</li>
    </ul>
  </li>
  <li>ATM Networks. ATM charges user a fee. Issuer of card is charged an Interchange fee by the ATM.
    <ul>
      <li>ATM Networks - Visa - Plus. Mastercard - Cirrus. Discover - Pulse.</li>
      <li>“Free” ATM networks - MoneyPass and Allpoint. Only charges the issuing bank. No cost directly charged to customer.</li>
    </ul>
  </li>
</ul>

<p>Flow of Payments within the Payment Ecosystem</p>

<table>
  <thead>
    <tr>
      <th> </th>
      <th style="text-align: center">Merchant</th>
      <th style="text-align: center">Acquiring Bank</th>
      <th style="text-align: center">Card Network</th>
      <th style="text-align: center">Issuing Bank</th>
      <th style="text-align: center">Customer</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Payment</td>
      <td style="text-align: center">+</td>
      <td style="text-align: center"> </td>
      <td style="text-align: center"> </td>
      <td style="text-align: center"> </td>
      <td style="text-align: center">-</td>
    </tr>
    <tr>
      <td>Acquirer Fee</td>
      <td style="text-align: center">-</td>
      <td style="text-align: center">+</td>
      <td style="text-align: center"> </td>
      <td style="text-align: center"> </td>
      <td style="text-align: center"> </td>
    </tr>
    <tr>
      <td>Network Assessment Fee</td>
      <td style="text-align: center">-</td>
      <td style="text-align: center"> </td>
      <td style="text-align: center">+</td>
      <td style="text-align: center"> </td>
      <td style="text-align: center"> </td>
    </tr>
    <tr>
      <td>Interchange</td>
      <td style="text-align: center">-</td>
      <td style="text-align: center"> </td>
      <td style="text-align: center"> </td>
      <td style="text-align: center">+</td>
      <td style="text-align: center"> </td>
    </tr>
    <tr>
      <td>Rewards</td>
      <td style="text-align: center"> </td>
      <td style="text-align: center"> </td>
      <td style="text-align: center"> </td>
      <td style="text-align: center">-</td>
      <td style="text-align: center">+</td>
    </tr>
  </tbody>
</table>

<p>Note: “Rewards” was added by me. What I find fascinating about this diagram is that every player in the ecosystem is incentivized to participate (i.e. every player has a “+”). I wonder - what are the costs of such a payment ecosystem?</p>

<h2 id="authorizations">Authorizations</h2>
<ul>
  <li>Authorization - happens at moment of swipe, dip, or tap at payment terminal. Action places hold of funds on the cardholder’s account or may decline transaction.</li>
  <li>Credit cards are often Dual-message Signature transaction - Authorization (message one) happens at the time of swipe. Followed up by Clearing (message two) happens in bulk at end of the night. Sometimes, Clearing message is different from Authorization if tip is included.</li>
</ul>

<h2 id="clearing">Clearing</h2>

<ul>
  <li>Clearing - The term “Clearing” used primarily by Issuers. Also called “Capture” by Merchant Acquirers. Clearing happens at the end of the day. Merchant will include tips, transaction reversals, and returns. Merchant confirms transactions are valid and funds are ready to be settled.</li>
  <li>Settlement - actual movement of money from cardholder’s bank account (Issuing Bank) to the Merchant’s bank account (Acquiring Bank). Typically happens via Fedwire.</li>
  <li>Card Network does not send the full amount. Card Network will
    <ul>
      <li>keep a percentage for itself as the Network Assessment Fee</li>
      <li>take a percentage and pass it on to the card Issuer as the Interchange Fee</li>
    </ul>
  </li>
  <li>Merchant Acquirer charges merchant an Acquirer fee too. This is charged at the end of the month.</li>
</ul>

<h2 id="chargeback">Chargeback</h2>

<ul>
  <li>Chargeback - when a cardholder doesn’t recognize a charge on a card, they may request their money back through the Issuing Bank. Chargebacks may used when goods and services have not been provided by the Merchant, but the Merchant refuses a refund.
    <ul>
      <li>Chargeback cost merchants $25-35 per transaction. Often, merchants will eat the cost of low-value transactions.</li>
    </ul>
  </li>
  <li>Merchants are encouraged to keep their chargeback rate below 1%. Otherwise, Card Network may remove the merchant.</li>
</ul>

<p>To help fight fraud and reduce the number of chargebacks, there are a number of technologies in play:</p>
<ul>
  <li>EMV Chip Card - Originally stood for “Europay, Mastercard, Visa” which established the technical standard of encoding card data onto a secure chip on a card. These cards are “dipped” into card terminals. Secured far more secure then data stored on a magnetic strip.
    <ul>
      <li>Merchants that have implemented this standard are not liable for fraud. The issuing bank must eat the cost of fraud here.</li>
    </ul>
  </li>
  <li>3D Secure - Standard for offering cardholders more security in online transactions. Involves the use of a one-time PIN or passcode.</li>
</ul>

<h2 id="banks">Banks</h2>

<ul>
  <li>Banks have 3 purposes - Issue cards. Serve as Acquiring Banks to merchants. Facilitate movement of real money.
    <ul>
      <li>Only banks can issue credit cards. If you would like to issue a card and you are not a bank, then you must partner with a bank.</li>
    </ul>
  </li>
  <li>New challenger banks are disrupting the field. They partner with small banks not subject to Durbin Amendment and its limits on interchange. Neo-banks make money off deposits and debit card interchange.
    <ul>
      <li>Note: Mobile has enabled these neo-banks to proliferate. Many consumers no longer care about the proximity of a physical bank branch when they can access the bank’s app in their pocket. Neobanks compete with traditional banks with various features such as 2-day early payday, better underwriting models for underbanked groups (ex: Karat credit card for content creators), and interest-free secured credit cards.</li>
    </ul>
  </li>
</ul>

<h2 id="taking-payments">Taking Payments</h2>

<ul>
  <li>Independent Sales Organization (ISO) - ISO is granted a license to sell Merchant acquiring services from a Merchant Acquirer</li>
  <li>Payment Facilitator (PF or PayFac) - Layer on top of a Merchant Acquirer. Payment facilitators can onboard very quickly and offer out-of-the-box hardware and software to enable a merchant to take payment.
    <ul>
      <li>Advantages: Fast setup. Fixed pricing. Managed Fraud.</li>
      <li>Drawback: Being a sub-merchant. Fixed pricing can be expensive.</li>
    </ul>
  </li>
  <li>How do payment facilitators make money? Revenue from Software or Hardware. Revenue from each transaction. Pay Merchant Acquirer the Acquirer fee. Pay card Issuer’s interchange. Pay Network network assessment fees. PF can aggregate transactions and negotiate low Acquirer fees.</li>
  <li>Using a Merchant Acquirer
    <ul>
      <li>Advantages: Being a Direct Merchant. Hardware and Software options. Interchange Plus Pricing. Can use flat fee or Interchange Plus. Faster Funds settlement.</li>
      <li>Disadvantages: More paperwork. Mange fraud directly.</li>
    </ul>
  </li>
  <li>How do Merchant Acquirers make money? Revenue from Hardware. Revenue from each transaction.</li>
  <li>Payment Service Provider - Aggregator of payment methods. Allows website to get paid via debit card, mobile wallets, and financing schemes such as Buy Now, Pay Later.</li>
</ul>

<h2 id="making-payments">Making Payments</h2>

<ul>
  <li>Co-Brand Partner - typically a brand or company marketing the card.</li>
  <li>Program Manager - manages the day-to-day operations of the card program including settlement, fraud management, and maintaining the relationship of the Issuing Bank, card manufacturer, card network, and cardholder.
    <ul>
      <li>Makes money by getting portion of interchange</li>
    </ul>
  </li>
  <li>Issuer Processor - connection between card and network. Needs to parse an ISO8583 message (standard card transmission) and respond in 3 seconds. Licenses a piece of hardware from the Network commonly referred to as a Mastercard Interface Processor (MIP) for Mastercard and a VisaNet Integrated Processing (VIP) for Visa.
    <ul>
      <li>Also responsible for integration with co-brand. May provide alerts or ability to turn on/off card. Issuer Processors provides APIs and documentation.</li>
      <li>Makes money as a utility based on number of cards or on each transaction.</li>
    </ul>
  </li>
  <li>JIT Funding - Forward details from card swipe for approval.</li>
  <li>Issuing Bank - works with Program manager to provide the settlement and bank accounts
    <ul>
      <li>Makes money by possibly charging Program Manager fees for setting up bank accounts, performing compliance audits, and general oversight. Issuing Bank also sponsors BIN. Card networks only give BIN to banks.</li>
    </ul>
  </li>
</ul>

<h2 id="know-your-customer-kyc">Know Your Customer (KYC)</h2>

<ul>
  <li>KYC - Practice in banking or finance used to attach identity to a user of a product</li>
  <li>For a better user experience, you should progressively ask for more and more information from the customer on a need-to-know basis. This step is critical especially for underbanked customers.</li>
</ul>

<h2 id="credit-vs-debit-card">Credit vs Debit Card</h2>

<ul>
  <li>23% of millennials don’t carry credit cards (TD Bank’s Annual Consumer Spending Survey)
    <ul>
      <li>Prefer credit cards for simplicity of seeing spending balance</li>
      <li>Note: I believe there’s an opportunity here to help millenials / generation Z better to better understand and feel secure with their money. The increased costs of higher education (and their associated student loans), increased cost of living, and skepticism towards social safety nets such as Social Security are contributing factors to feeling anxious about one’s personal finance.</li>
    </ul>
  </li>
  <li>Some industries such as travel prefer credit cards because funds are guaranteed by the Issuing Bank</li>
</ul>

<h2 id="interchange">Interchange</h2>

<ul>
  <li>Durbin Amendment - Banks with assets greater than \$10 billion are regulated for Interchange fees. Regulated Issuers get \$0.21 + 0.05% of the transaction amount + \$0.01 for fraud.</li>
  <li>Merchants who clear faster qualify for lower Interchange rates.</li>
  <li>Track 3 data - Some merchants provide receipt level details (Track 3 Data) to card networks. When this data is provided, lower interchange is charged.</li>
</ul>

<p>Other solutions for merchants to lower interchange fees:</p>

<ul>
  <li>Private label cards - Some merchants offer their own cards to use at the store. These cards only work at the store that issued the card. Benefits include direct access to customer spending data, reserves spot in customer wallet, no interchange paid to issuer, little or no fees paid to acquirer, brand loyalty.</li>
  <li>Co-Brand Cards - Brand partners with an issuing bank and card network. For example, Amazon partnered with Chase and Visa to create an Amazon credit card. Since Amazon arranges this deal, they will typically get lower network assessment fees with the network and lower interchange with the issuing bank. The brand and issuing bank can also earn interchange on transactions outside of brand.</li>
</ul>

<h2 id="moving-money-without-card-networks">Moving Money without Card Networks</h2>

<ul>
  <li>ACH
    <ul>
      <li>Over 82% of electronic payments in the US are ACH (Automated Clearing House)</li>
      <li>ACH is a technology offered by the “Clearing House,” which is a nonprofit organization. This is a network of banks that have come together to enable movement of money interbank through the use of bank account and routing number. This is a batch process.</li>
      <li>Efficient and inexpensive. However, it’s not the fastest. Since it’s a batch process, there are “cutoff windows”</li>
      <li>Direct Deposit - type of ACH transfer that typically comes from an employer into an employee’s bank account. Employers would give Fed NACHA file to transfer the funds. This NACHA file is often sent earlier but funds are only moved on effective date. However, some banks are willing to loan money for 2 days once they see the NACHA file from the employer.</li>
      <li>NACHA is pushing for faster payments by offering Same-Day ACH</li>
    </ul>
  </li>
  <li>Peer-to-Peer and ACH
    <ul>
      <li>Venmo - Venmo, using Plaid, can see the state of the bank account. Venmo can safely float the funds for a couple days as it waits for ACH However, to the sendee, the funds are sent immediately.</li>
      <li>Zelle - Groups of banks share a ledger. Money moves quickly and does not wait for “settlement”.</li>
    </ul>
  </li>
  <li>Wire Transfer
    <ul>
      <li>Way to move money (usually large dollar amounts) from one bank to another securely and quickly by using account and routing numbers of the sending and recieving banks.</li>
      <li>In the US, the Federal Reserve provides Fedwire which is the primary way to wire funds between banks. This is the supported by almost every bank. The Clearing House also provides a wire service called Clearing House Interbank Payments System (CHIPS).</li>
      <li>Movement of money is instant. However, humans need to confirm that money has moved.</li>
    </ul>
  </li>
  <li>Real-time Payments
    <ul>
      <li>Wire transfers are used for large transfer, but can also be applied to smaller payments. Main barrier is that cost of wire is high because humans need to confirm money movement.</li>
      <li>Real-Time Payments (RTP) - way to push money within seconds by sending money directly to a bank account offered by The Clearing House.</li>
      <li>Cost is capped to $0.045 per transfer</li>
    </ul>
  </li>
</ul>

<h2 id="how-do-you-create-a-credit-card">How do you create a credit card?</h2>
<p>As an engineer working within the Card division at Capital One, I have thorougly enjoyed learning about the various parts of the payment ecosystem. I don’t know a lot about payements yet, but I am learning more everyday.</p>

<p>One observation that I’ve had is that most of the innovation happens on either end of the swipe — closest to the merchant or customer. The underlying core infrastructure often remains stagnant on old technology (however, this is changing too. re: peer-to-peer payments, decentralized ledgers, blockchain). At Capital One, the company is focused on the customer side of payments. One driving question I’ve been asking myself is, “What valuable niches exist in the market and how can Capital One design a credit card for that niche?”</p>

<p>There are many ways to spot niches in the market. You can divide the market up into new to credit, mainstreet, premium, and ultra-premium. You can look at consumer vs. business cardholders. Or, you can also look at the payment market from the merchant side too and examine co-brand opportunities.</p>

<p>Once you have identified the niche, you need to create a monetized product to cater to those consumers. At Capital One, there are two main sources for revenue in credit cards: credit card interest (16.6 billion USD in 2022) and interchange fees (4.6 billion USD in 2022) [2]. The credit card must offer the consumers compelling benefits in order to 1) use the credit card (interchange fees) and 2) revolve and eventually pay off a credit card balance (net interest income).</p>

<p>Now, you need to create the credit card. This involves designing the credit card benefits, advertising the card, handling customer applications, underwriting each application, obtaining capital to loan, and servicing the card. Fortunately, Capital One (and other credit card companies) are platform companies and they have already built a lot of the tools in-house. They benefit from economies of scale. They have low-cost to access capital through their own customer deposits. They have advertising partnerships with top athletes. They have pre-existing web, mobile, and physical channels for applying and servicing a card. Lastly, they are a brand to whom you would search for other financial products such as another credit card, a high-yield checking/savings account, or a car loan.</p>

<p>Similar to how AWS has become the platform to power the web, Capital One needs to become the platform for consumer financial products. As software engineers, software will play a critical role in 1) automate processes such as application, issuing virutal cards, underwriting each application, analyzing fraud risk, and marketing new cards and 2) analyzing large amounts of data to better underwrite and create additional financial products.</p>

<p>Sources:</p>

<p>[1] The Federal Reserve Payments Study: 2022 Triennial Initial Data Release. <a href="https://www.federalreserve.gov/paymentsystems/fr-payments-study.htm">https://www.federalreserve.gov/paymentsystems/fr-payments-study.htm</a></p>

<p>[2] 2022 Capital One Annual Report. <a href="https://www.capitalone.com/investor/financials/annual-report/">https://www.capitalone.com/investor/financials/annual-report/</a></p>]]></content><author><name>Ethan Nguyen</name><email>contact@ethannguyen.anonaddy.com</email></author><summary type="html"><![CDATA[Ever since I got my first credit card, I have been interested in better understanding how card payments work. There are ~276 million card transactions handled everyday [1]. How are these transactions handled? What is the busienss model of the incumbent businesses? What are the major pain points of the current payments system? How would you create a new credit card offering?]]></summary></entry><entry><title type="html">Vector Databases</title><link href="https://ethanhn.com/posts/2023/08/vector-databsase/" rel="alternate" type="text/html" title="Vector Databases" /><published>2023-08-20T00:00:00-07:00</published><updated>2023-08-20T00:00:00-07:00</updated><id>https://ethanhn.com/posts/2023/08/vector-databases</id><content type="html" xml:base="https://ethanhn.com/posts/2023/08/vector-databsase/"><![CDATA[<h3 id="what-are-vector-databases">What are Vector Databases?</h3>

<p>Vector databases are databases that store vectors. Their core function is semantic search — if you have an input vector, you can find the top k most similar vectors in the database.</p>

<h3 id="why-vector-databases">Why Vector Databases?</h3>

<p>Large language models (LLMs) are a type of generative AI that generates text. LLMs are trained on vast amounts of text data, but they have two limitations:</p>

<ol>
  <li>LLMs are trained on a lot of data, and compress that vast training data into their limited memory. They lose some information in this compression.</li>
  <li>LLMs do not have access to proprietary information. LLMs are only trained a publicly available data.</li>
</ol>

<p>To fix these two issues, one popular approach has been to feed information the LLM needs into its context [1]. LLMs have “good” reasoning ability [2], and will use this context to craft a more helpful and specefic response. Vector databases are used to store this context information.</p>

<h3 id="how-can-they-be-used-in-business-to-automate-process-x">How can they be used in business to automate process X?</h3>

<p>Vector databases are used to give more information to LLMs outside their training data. For business, this often includes their own proprietary data. Let’s walk through a use case.</p>

<p>Assume a business conduct sales through email. They hire salespersons with varying levels of abilities. The best salesperson can sell 40% more products than the average employee. As an ML engineer, we can design a system to help the average salesperson obtain the sales figure of the best salesperson. Here’s how:</p>

<p>The business has all the data on a customer email and how their best salesperson responds. These pairs of emails are examples that we can use. We insert these email pairs into a vector database. Now, when the business receives an email from the customer, the system can find similar email examples in the vector database, pass it into the LLM’s context as examples, and ask the LLM to generate a new email to respond to the customer using these examples.</p>

<p>Using this system, every salesperson in the company can be brought up to the level of the best salesperson.</p>

<h3 id="where-can-i-learn-more">Where can I learn more?</h3>

<p>I find LLMs and vector databases are fascinating too! There are many resources to learn more about them. Here are a few:</p>

<p>If you want to understand the landscape of vector databases and their technical details, checkout The Data Quarry’s series of blogs: <a href="https://thedataquarry.com/posts/vector-db-1/">https://thedataquarry.com/posts/vector-db-1/</a></p>

<p>If you’re interested in seeing a tutorial for this use case, the YouTube channel AI Jason has an excellent video for this: <a href="https://www.youtube.com/watch?v=c_nCjlSB1Zk">https://www.youtube.com/watch?v=c_nCjlSB1Zk</a></p>

<h3 id="conclusion">Conclusion</h3>

<p>Vector databases are used to store additional contextual information for an LLM. This can include information more specific to the problem such as sales data. It is important to note that vector databases are used for more than just LLMs.</p>

<p>LLMs have potential to not only automate processes but also close the skill gap between senior and junior employees. They can also help retain proprietary knowledge when senior employees leave. In business, ML systems can help automate processes and improve the productivity of their employees.</p>

<p>Notes:</p>

<p>[1] The other common approach is fine-tuning.</p>

<p>[2] LLMs are known to make reasoning mistakes. It is still unclear whether current LLMs are capable of true reasoning. However, this will likely improve over time with more data and better models. Check out these research articles: <a href="https://arxiv.org/pdf/2212.10403.pdf">https://arxiv.org/pdf/2212.10403.pdf</a> and <a href="https://arxiv.org/pdf/2303.12712.pdf">https://arxiv.org/pdf/2303.12712.pdf</a>.</p>]]></content><author><name>Ethan Nguyen</name><email>contact@ethannguyen.anonaddy.com</email></author><summary type="html"><![CDATA[What are Vector Databases?]]></summary></entry></feed>