The Data Elicitation Playbook: Organizing Your Field Service Data for AI

Every AI conversation in field service eventually lands on the same uncomfortable truth: the data isn't ready. Not because the organization doesn't have data — most utilities and telecoms are drowning in it — but because no one has systematically discovered where it all lives, assessed its quality, or organized it for the AI use cases that would actually move the needle.

After 15+ years of implementing Oracle Field Service across utilities, energy companies, and telecom operators, I've learned that data elicitation — the structured process of discovering, inventorying, and preparing your data — is the single most undervalued step in any AI initiative. Get it right and everything accelerates. Skip it and you'll spend months debugging AI models that are really just reflecting your data problems back at you.

This playbook walks through the how: practical techniques for uncovering what data you actually have, frameworks for organizing it, and industry-specific guidance for utilities and telecom. Whether you're starting an AI initiative or just trying to get your data house in order, this is the work that makes everything else possible.

You can't give your field operations AI superpowers if you don't know what raw material the AI has to work with. Data elicitation is the inventory before the transformation.

Why Data Elicitation Comes Before Everything Else

Most organizations jump straight from "we want AI" to "let's build a model." The missing step in between — discovering what data actually exists, where it lives, what state it's in, and who depends on it — is where projects succeed or fail.

In a typical utility or telecom, field service data is spread across 15–50+ systems accumulated over decades of growth, mergers, and technology cycles. Work order data in Oracle Field Service. Asset records in an ERP. Spatial data in GIS. Sensor data in SCADA. Customer information in a CIS. Technician skills in... a spreadsheet someone updated last year.

Research suggests that 40–60% of operational knowledge in field service organizations resides outside formal systems — in personal spreadsheets, email chains, chat messages, dispatcher notebooks, and the heads of experienced workers. This "shadow data" is often the most valuable for AI, and the hardest to discover.

Data elicitation makes the invisible visible. It's the difference between building AI on a foundation you understand and building it on assumptions you'll regret.

Step 1: Discover What You Actually Have

Before you can organize data, you need to find it. Here are the proven techniques — from structured system audits to the frontline conversations that reveal what no system inventory ever will.

System-of-Record Inventory

Start with the obvious: catalog every enterprise system that touches field service operations. ERP, WFM, GIS, CIS, SCADA, CRM, fleet tracking, document management. For each system, document the owner, the data it holds, its integration points, and how current the data is.

But don't stop at major systems. Survey departments for tools they use daily — that Access database in the maintenance office, the Power BI report someone built manually, the Google Sheet that tracks certifications. These departmental tools often contain data that the enterprise systems lack.

Data Flow Mapping

Trace the lifecycle of a work order from creation to completion. At each stage, document: what data is created, what data is consumed, what system holds it, and who touches it. Follow 3–5 representative work order types end-to-end — a routine maintenance job, an emergency call, a planned capital project, a customer-initiated service request.

This reveals integration gaps that system inventories miss. You'll discover points where data is re-keyed manually between systems, where spreadsheets bridge gaps, and where information simply gets lost in the handoff.

Shadow Data Discovery

This is where the real gold is — and where most organizations skip the work. Shadow data includes every spreadsheet, personal file, whiteboard note, and tribal knowledge that people rely on daily but that no system captures.

The "What breaks when the system goes down?" test: Ask each team what they do when their primary system is unavailable. The workarounds they describe reveal shadow processes and data sources that are invisible during normal operations.

"Day in the Life" shadowing: Spend a full day physically following a dispatcher, a technician, and a planner. Document every time they reference something outside the official system. This single technique — borrowed from ethnographic research — consistently uncovers more data sources than weeks of interviews.

Frontline Engagement

Your technicians, dispatchers, and customer service reps are the people closest to the data. They know where it's wrong, where it's missing, and what they wish they had. But they're also the hardest to pull into data conversations — they're measured on productivity, not on attending workshops.

Best practices for engaging frontline workers:

Respect their time. Short, focused sessions — 45 minutes max. Never during peak hours.
Speak their language. "When the job details are wrong" instead of "data quality issues."
Show immediate value. "If we had accurate part numbers, you wouldn't need the second trip."
Use visual aids. Bring actual screenshots of work orders and ask them to mark what's wrong.
Ride-alongs. Send your data team into the field. One day with a technician reveals more than a month of analysis.
Micro-surveys. 3–5 question mobile surveys after job completion: "Was the information accurate? What was missing?"

Step 2: Organize Data Into the Eight Domains That Matter

Once you've discovered what exists, organize it into business-relevant domains. For field service, these eight domains cover the data landscape that AI needs to operate:

Domain 1

Work Order Data

The backbone. Job types, priorities, lifecycle timestamps (created, scheduled, dispatched, arrived, completed), symptom codes, resolution codes, parts consumed, revisit flags. Where most AI learning happens.

Domain 2

Asset & Equipment Data

Asset registry, maintenance history, failure codes, manufacturer specs, condition data, warranty status, parent-child relationships. The foundation for predictive maintenance.

Domain 3

Workforce Data

Technician skills, certifications (with expiration dates), availability, real-time location, performance history, cost rates. The fuel for intelligent scheduling and dispatch.

Domain 4

Parts & Inventory Data

Stock levels (warehouse and van), parts-to-asset compatibility, supplier lead times, consumption history, return rates. Critical for first-time fix rate improvement.

Domain 5

Customer & Contract Data

Account records, SLA terms (structured, not buried in PDFs), entitlements, site access requirements, communication preferences, satisfaction scores.

Domain 6

Geospatial & Contextual Data

Geocoded locations, service territories, network topology, real-time traffic, weather data, historical travel patterns. Essential for route optimization.

Domain 7

Safety & Compliance Data

Permits, inspection records, incident reports, training certifications, lock-out/tag-out logs, regulatory documentation. Non-negotiable in utilities and telecom.

Domain 8

Knowledge & Documentation

Troubleshooting guides, equipment manuals, repair procedures, known-error databases, training materials. The unstructured goldmine that AI can unlock.

For each domain, your data inventory should capture: what attributes exist, which systems hold them, how current and complete they are, who owns them, and which AI use cases depend on them. This becomes your data catalog — the living document that guides every AI decision going forward.

The Utility Data Challenge: 50 Years of History in 50 Systems

Utilities are among the most data-rich and data-siloed organizations on the planet. A typical large utility operates 30–100+ distinct systems, accumulated through decades of organic growth, mergers, and regulatory mandates. The data elicitation challenge here is uniquely complex.

The GIS-to-ERP Mismatch

This is the single most common data challenge in utilities — and it affects nearly every one I've worked with. GIS is maintained by engineering teams focused on spatial accuracy and network topology. ERP is maintained by finance and maintenance teams focused on cost tracking and work management. Over time, they drift apart.

GIS shows 50,000 poles; ERP shows 47,000. Equipment was replaced in the field but only updated in one system. GIS uses facility IDs; ERP uses equipment numbers; no reliable crosswalk exists. Before any AI model can learn from asset data, these two worlds must be reconciled — starting with your most critical asset classes.

The 50-Year Asset Problem

Utilities manage assets with 30–80 year lifespans. Asset records span multiple generations of record-keeping — some predate computerized systems entirely. Paper-to-digital migrations introduced errors that were never corrected. 15–30% of assets in many utilities lack accurate installation dates, and inconsistent naming conventions mean the same transformer type might appear as "transformer," "xfmr," "TX," or "distribution transformer" across different systems.

For AI-driven predictive maintenance, this is a showstopper. The fix isn't glamorous: start with your most critical (and highest-cost) asset class, reconcile records across systems, and establish a single naming convention and identifier. Then expand.

Regulatory Data Considerations

Utility data doesn't exist in a vacuum. NERC CIP standards govern how critical infrastructure data is classified and accessed. State PUC reporting requires accurate reliability metrics (SAIDI, SAIFI, CAIDI). Safety compliance demands traceable training, certification, and incident records. Any AI system that influences decisions related to regulated activities must maintain full data lineage and auditability — "black box" AI is not acceptable in this environment.

The Telecom Data Challenge: Volume, Velocity, and the OSS/BSS Divide

Telecom field service operates at a scale and velocity that utilities rarely match. A major telecom may process 50,000–200,000 work orders per week. The data elicitation challenge isn't just finding data — it's making sense of data that moves fast and spans deeply fragmented systems.

The OSS/BSS Divide

The fundamental data architecture of most telecoms splits into Operations Support Systems (OSS) — managing the network — and Business Support Systems (BSS) — managing customers and billing. Field service sits uncomfortably between both. A technician installing fiber needs data from network planning (OSS), customer records (BSS), workforce management (neither), and inventory systems (sometimes a third silo entirely).

The elicitation challenge: understanding which systems of truth hold which data, and where the gaps between OSS and BSS create blind spots for field operations. Common examples: the network inventory says a customer has copper service; the billing system shows a fiber upgrade order; the field service system has no visibility into either.

Truck Roll Avoidance — The Telecom AI Priority

In telecom, the highest-impact AI use case is often truck roll avoidance — resolving issues remotely before dispatching a technician. The data required: network telemetry (modem status, signal levels, error rates), customer interaction history (did they already power-cycle the equipment?), and remote diagnostic capabilities (can we push a firmware update or reset remotely?).

Elicitation insight: this data often exists but is trapped in network management systems that field service platforms can't access. The integration gap between network operations and field service is the #1 data challenge in telecom AI.

Customer Experience Data

Telecoms have richer customer experience data than most industries — NPS scores, churn prediction models, social media sentiment, call center interaction logs. Connecting this data to field service operations (does a missed appointment window correlate with churn? does first-visit resolution predict NPS?) is a high-value AI opportunity that most telecoms haven't fully exploited because the data lives in separate organizational silos.

The Data Sprint: Iterate, Don't Boil the Ocean

The biggest mistake I see organizations make is treating data preparation as a prerequisite that must be completed before AI can begin. This leads to 12–18 month "data cleansing projects" that run out of budget, executive patience, or both — long before any AI value is delivered.

The better approach: Data Sprints — focused 2–4 week cycles that prepare the minimum viable data for a specific AI use case, then iterate.

The Minimum Viable Data (MVD) Concept

For each AI use case, define the minimum data quality threshold needed to run a meaningful pilot — not perfect data, but "good enough to learn from." If you're piloting AI-assisted scheduling, you need current technician skills, accurate job durations, and reliable location data. You don't need a fully reconciled asset registry. Focus the sprint on what the use case actually requires.

Anatomy of a Data Sprint

Week 1: Scope and Profile

Define the AI use case and its data requirements. Run automated profiling on the relevant data domains — completeness rates, value distributions, duplicate percentages. Score each domain on the six quality dimensions: completeness, accuracy, timeliness, consistency, uniqueness, and validity. Identify the critical gaps that would prevent the AI from functioning.

Week 2: Fix the Critical Gaps

Address the must-fix issues. Standardize a key taxonomy (e.g., work order resolution codes). Clean a critical master data set (e.g., technician skills). Build a temporary integration for a missing data flow (e.g., real-time van inventory). Focus on the 20% of data issues causing 80% of the AI quality impact.

Week 3: Connect and Validate

Wire the data into the AI use case environment. Run validation: does the model produce sensible results with the cleaned data? Where does it still break? What additional data quality issues surface only when the AI actually tries to use the data? These "AI-revealed" quality issues are impossible to find through profiling alone.

Week 4: Operationalize and Plan Next Sprint

Establish automated quality monitoring for the data domains you've cleaned. Document what was fixed and how. Update your data catalog. Identify what the next sprint should tackle — either expanding the current use case's data foundation or preparing for the next use case.

Each sprint delivers tangible progress. The AI pilot gets better data to learn from. The data team gets specific, purposeful work instead of abstract "data cleansing." And leadership sees continuous movement instead of a multi-year project with a distant payoff.

The Anti-Patterns: What Not to Do

These mistakes are so common they're practically industry norms. Recognizing them early saves months of wasted effort.

"Boil the Ocean" — Trying to Fix All Data First

The instinct to clean all data across all domains before starting any AI initiative is understandable but fatal. It takes too long, costs too much, and delivers no value until it's "done" — which it never is. Data quality is a continuous journey, not a destination. Fix data in service of specific use cases, not as an abstract goal.

"Build It and They Will Come" — Deploying AI Without Data Prep

The opposite extreme: buying an AI platform, pointing it at whatever data exists, and hoping for the best. AI trained on bad data produces bad results with high confidence — which is worse than no AI at all, because it erodes trust. Even the smartest algorithm can't compensate for 30% missing resolution codes or a technician skills matrix that hasn't been updated in two years.

"IT Owns It" — Centralizing Data Responsibility Without Operations

Data elicitation led exclusively by IT produces technically accurate inventories that miss operational reality. IT knows what systems exist and how they're integrated. Operations knows what data is actually used, what's trusted, and what workarounds people rely on. Effective data elicitation requires joint ownership — IT brings the technical lens, operations brings the truth.

"Perfect Before Pilot" — Demanding Flawless Data for a Proof of Concept

A POC is supposed to prove whether an AI approach is viable, not whether your data is perfect. Set realistic quality thresholds for the pilot, document known data limitations, and use the pilot itself to reveal which quality issues actually matter. Some data problems that look critical on paper turn out to be irrelevant to the specific AI model. You won't know which ones until you try.

The Human Side: Who Owns Data Elicitation?

The most effective model I've seen in field service organizations is a joint ownership structure with clear roles:

Executive Sponsor (from Operations, not IT): A VP of Field Operations or Director of Work Management who can prioritize data quality work alongside daily operations. This person ensures data elicitation isn't treated as an IT side project.
Data Stewards (per domain): Operational people — not IT people — who own the quality and accuracy of data in their domain. The asset engineer stewards asset data. The workforce planner stewards skills data. They set the rules; IT builds the tools to enforce them.
Integration Architect (from IT): The technical counterpart who understands how systems connect, where data transforms, and what's feasible to integrate. They translate operational data needs into technical solutions.
Frontline Advisory Panel: A rotating group of 5–8 technicians and dispatchers who review data quality initiatives, test new workflows, and provide ground-truth feedback. Their buy-in is essential — they're the ones entering the data.

The worst model: assigning data elicitation to a single analyst in IT and hoping they'll figure it out. Data is an organizational asset; organizing it requires organizational involvement.

Getting Started: Your First 30 Days

You don't need a major initiative to begin. Here's a practical 30-day starting point:

Days 1–10: System and Shadow Inventory

Catalog every system that touches field service. Survey department leads about what tools, spreadsheets, and workarounds their teams rely on. You're building the first comprehensive view of where your data lives — including the places nobody thought to look.

Days 11–20: Work Order Lifecycle Deep Dive

Follow 3–5 real work orders from creation to completion. At each step, document what data was created, consumed, and missing. Note every point where someone had to call someone else, check a separate system, or rely on personal knowledge. These friction points are your data gaps — and your AI opportunities.

Days 21–30: Quality Baseline and Priority Map

For each data domain, score quality across the six dimensions (completeness, accuracy, timeliness, consistency, uniqueness, validity). Map these scores against your highest-value AI use cases. The intersection of "poorest data quality" and "highest AI impact" is where your first data sprint should focus.

At the end of 30 days, you'll have something most organizations never build: a clear, honest picture of your data landscape, anchored in operational reality. That's the foundation everything else builds on.

Data elicitation isn't a one-time project — it's the discipline that makes AI trustworthy. The organizations that build this muscle first will be the ones that deploy AI with confidence, not hope. And confidence is what separates a pilot from a production system.