Lessons from government AI pilots that successfully scaled

Graphic with the text “How public sector leaders are testing AI deployments,” featuring a robotic hand launching a rocket against a dark blue background.

AI is no longer a future conversation for government agencies 

During PayIt’s recent AI-focused webinars, one message came through clearly: Government leaders aren’t debating whether or not AI will affect modernization; it’s already an operating reality, and public sector leaders are trying to figure out how to adopt it responsibly — without undermining trust, privacy, or service quality.

In PayIt’s From Hype to How: A Practical Playbook for AI and Government and How the Trust Gap Could Affect AI Modernization and What Leaders Should Do About It, panelists shared what’s working now and what’s getting in the way. This post recaps the most practical guidance from speakers from both sessions:

  • Adam Christensen, Senior Vice President, Product, PayIt
  • Christian Napier, Director of Artificial Intelligence, Utah Division of Technology Services
  • Alberto Gonzalez, ITS Administrator and Chief Information Officer, State of Idaho
  • Rachel Stern, Founder & Managing Partner, GovTech Ventures

Plan to start with trust and measurable outcomes

Christian Napier described Utah’s early approach as two foundational questions:

  1. Given the challenge of hallucination, can AI be deployed in a responsible way that can be trusted?
  2. What internal-facing use cases could create meaningful impact for a workforce of 22,000 employees?

This approach avoids a common pitfall: selecting pilots based on novelty rather than measurable value, safety, and adoption.

Initial implementations: How are agencies using generative AI in government?

Adam Christensen said many agencies start by using AI to remove some grunt work from of day-to-day life — so benefits are immediately visible. In practice, this often means internal tools first (where risk is lower and iteration is faster) before moving AI into resident-facing experiences. Common first-wave internal use cases include:

  • Meeting capture and summaries 
  • Drafting routine communications 
  • Synthesizing research interviews and qualitative feedback
  • Knowledge assistance for frontline staff

These AI deployments reduce the amount of time that staff spend on repeatable, manual work and build comfort with AI tools before higher-stakes deployments.

How to choose AI pilots that actually make it to production

Pilot selection determines whether AI projects stall or scale. Utah initially ran many chatbot pilots because they were easy to stand up, but they encountered two common problems:

  • Pilots are cheap; production is not. Free proofs of concept often mask long-term costs.
  • “Sidecar” tools don’t change behavior. If AI sits next to an existing process instead of improving it, adoption drops.

Since their initial foray into AI pilots, Utah staff have tweaked their strategy, focusing much more at the beginning on understanding the business value.

Instead of starting with what was technically feasible, they start with these questions early on:

  • What business value will this deliver?
  • Will it meaningfully change a workflow?
  • Is the outcome measurable?

That adjustment led Utah to prioritize higher-impact, higher-value use cases, even if they took more upfront planning.

A resident-facing pilot with clear ROI: document verification before an office visit

Adam Christensen offered a concrete example of a “smallest useful pilot” that still drives meaningful outcomes: residency document verification. Instead of residents bringing documents into an office, AI can validate documents digitally in advance. This type of AI pilot helps produce meaningful outcomes:

  • Eliminate multiple trips for residents
  • Reduce staff time spent scanning, uploading, and checking documents
  • Enable “FastPass”-style routing for those pre-verified

He described a success path with clear timeboxes:

  • A 60-day confidence/testing phase with agency documents
  • A 90-day soft or hard launch for a specific service

This is an important model for government agencies hoping to find success with government AI pilots: narrow scope, measurable outcomes, and a workflow change that benefits both residents and staff.

What AI is delivering today in government

Rather than isolated experiments, the webinars highlighted a pattern: AI works best when paired with clear guardrails, clean data, and real operational goals.

High-accuracy support in tax operations

Utah piloted an internal chatbot to support tax call center employees, running a vendor “bake off” to test accuracy. After a second tuning phase, the chatbot produced responses judged equivalent to or better than a knowledgeable human 97% of the time. The solution is now in production for tax purposes and will be expanding to other agencies.

Workforce productivity gains at scale 

Utah’s pilot of Gemini for Google Workspace showed regular users saving an average of 3.5 hours per week.

Today:

  • About 9,500 of Utah’s 22,000 employees are active users
  • Agencies opt in, allowing adoption to grow at a sustainable pace

Automation that reduces physical office demand

Alberto Gonzalez shared that Idaho reduced DMV walk-in traffic by 40% using automation and machine learning, without layoffs. Instead:

  • Staff were repurposed and upskilled
  • Attrition, not reductions, absorbed long-term changes

Faster, better-informed case decisions 

In child welfare, Idaho is using controlled AI models to analyze years of case data to support permanency decisions — work that previously took hours now takes minutes.

Data hygiene is the unglamorous prerequisite that determines whether AI works

In the second webinar, Alberto Gonzalez emphasized: AI readiness starts with data readiness.

Residents want accurate answers and decisions, and governments have a responsibility to provide accurate, accessible information.

Rachel Stern added a risk that agencies underestimate: incomplete data sets.

“Decisions are being made with incomplete data sets, and to me that is where AI becomes dangerous, where you are taking in sort of bits and pieces of the truth and then making system-wide decisions.”

This is why “AI strategy” often fails when it overlooks fundamental tasks, such as document management and consistent operating procedures.

High-value government AI is often focused on automation

Across both webinars, automation emerged as a feasible and high-value shift. Napier predicted that automation is what will change most in the next 12 months — moving from sidecar AI tools to rip-and-replace workflows.

Gonzalez echoed that philosophy from a different angle: He’d been pushing automation before today’s generative AI wave, and he highlighted an outcome that matters to every public servant: throughput without making staff changes.

A few core risks to plan for 

  1. Hallucinations and accuracy drift
    Utah treated trust as a foundational requirement and tested performance rigorously (e.g., accuracy targets for tax chatbots).
  2. Incomplete or messy data
    Both webinars returned to the same root cause: bad inputs produce bad outputs. Data hygiene isn’t optional if AI will influence decisions.
  3. Privacy and security concerns
    Napier emphasized a “privacy is paramount” posture, describing layered review:  The office of data privacy conducts its own initial privacy impact assessment for any pilots, and then, in addition, our security team will do a security assessment as well.
  4. Workforce anxiety
    Rachel Stern described a real organizational risk: If AI removes easy tasks, staff may be left with the most complex, emotionally difficult cases, impacting satisfaction. Leaders need to plan for employee experience, not just efficiency.
  5. Public trust and transparency gaps
    Don’t assume better service is enough without proactive communication of accountability, metrics, and safeguards.

The role of training and change management

Utah’s training approach started with policy and scaled into repeatable education:

  • Enterprise generative AI policy requiring employee acknowledgment
  • Responsible AI training and annual security awareness training
  • Lunch-and-learns, cross-agency user groups, and tailored training by request
  • Open office hours

Napier also called out something many organizations underestimate: organic growth. “ They’re figuring it out themselves, and their natural inclination when they figure out some cool use case is to share that.”

Pilots are a bridge, but procurement models must evolve

The webinar discussions also raised a structural constraint: Procurement tends to buy what governments already know to ask for. Stern argued for earlier-stage discovery, more RFIs  versus RFPs, and more openness to vendor ideas before locking in requirements.

Gonzalez reinforced this with practical tactics:

  • Use consortium vehicles where possible to shorten timelines
  • Choose trusted partners who integrate cleanly rather than forcing lock-in

A checklist you can apply immediately

Choosing your first AI pilots

  • Pick a narrow workflow with measurable outcomes (time saved, rework reduced, errors reduced)
  • Prioritize internal use cases first to build confidence and competence
  • Avoid “sidecar” tools unless you have a clear adoption plan and the process genuinely benefits from a helper

Preparing for resident-facing AI

  • Start with transactional, bounded use cases (document verification, routing, status checks)
  • Design for auditability: human review paths, logs, and QA
  • Build trust through transparency: what the system does, what it doesn’t do, and how errors are handled

Scaling responsibly

  • Treat data hygiene as a program, not a one-time task
  • Establish cross-functional gates (privacy, security, operations, legal/regulatory)
  • Budget for production early, and don’t assume pilot economics translate to ongoing operations
  • Choose vendors that integrate with existing systems and keep your options open

Governments that are winning with AI are doing 3 things well

Across both webinars, the leaders who see progress aren’t chasing novelty. They’re doing three things consistently:

  1. Starting with trust and discipline: test accuracy, define guardrails, and plan for production
  2. Fixing the fundamentals: clean data, clear operating procedures, and governed access
  3. Targeting automation that improves outcomes: better workflows and service delivery 

As we predicted, AI will have more impact in government this year because it’s finally being applied with intention. What determines success is whether government agencies can resist the urge to chase novelty and instead focus on fundamentals, measurable outcomes, and improved digital government services.

Looking for more content?

Get articles and insights from our monthly newsletter.