Building a Fabric Data Agent with On‑Premises SQL Server Data
Building a Fabric Data Agent with On‑Premises SQL Server Data
Created on 2026-04-13 10:42
Published on 2026-04-13 10:43
Many organizations still rely on on‑premises SQL Server for core operational workloads — finance, billing, ERP, manufacturing, and regulatory systems — that cannot be easily migrated to the cloud. At the same time, the business is demanding:
Conversational access to data (“Ask the data” experiences)
AI‑powered insights without building fragile ETL pipelines
Faster time‑to‑value from analytics and AI
Strong governance, security boundaries, and system ownership
Microsoft Fabric addresses this challenge by allowing organizations to bring AI to their data before moving their data to AI.
By leveraging SQL Server mirroring into Microsoft Fabric, organizations can continuously replicate on-premises data into OneLake, making it immediately available for analytics and AI—without disrupting source systems. On top of this foundation, semantic models and Fabric Data Agents enable governed, natural-language interactions powered by Microsoft Copilot.
This architecture allows teams to:
Preserve existing operational systems: Keeping workloads on platforms like Microsoft SQL Server avoids risk and disruption. You maintain stability while extending value through analytics in Microsoft Fabric.
Eliminate complex ETL pipelines: Continuous mirroring replaces fragile, batch ETL processes—reducing maintenance, cost, and data latency.
Deliver trusted, AI-assisted analytics at scale: With governed data and semantic models, tools like Microsoft Copilot provide accurate, consistent insights—building trust and enabling broad adoption.
1.– Reference architecture
In scenarios like this, I usually anchor the design around four key architectural principles:
Operational workloads remain on-premises: Core systems stay in platforms like Microsoft SQL Server to ensure stability, performance, and compliance.
Analytics and AI are executed in Fabric: Offloading analytics to Microsoft Fabric separates workloads and enables scalable, modern data processing.
Semantic meaning is explicitly defined: Business logic and definitions are formalized (e.g., via semantic models), ensuring consistent and accurate interpretation of data.
AI interacts only with curated, governed data: Tools like Microsoft Copilot access only trusted, well-defined datasets—reducing errors and improving reliability.
This separation is essential to achieving scalability, maintaining trust, and enforcing strong governance.
Here is my proposed reference architecture:
2.–End‑to‑end steps to follow
To build a minimum-viable-product (MVP) aligned with the proposed reference architecture, at a minimum, we need to follow five key steps.
2.1.– Prepare the on‑premises SQL Server
Start by ensuring your source system meets mirroring requirements:
Supported versions: SQL Server 2016–2022 (CDC-based) or 2025
Tables must include: Primary keys Supported data types
Avoid unsupported features such as: Temporal tables Always Encrypted In-memory tables
Mirroring is strictly read-only from Fabric’s perspective—source systems are never modified.
2.2.– Configure secure connectivity
If SQL Server is behind a firewall:
Install On‑premises Data Gateway or VNet Data Gateway
Only outbound connectivity is required
No inbound firewall rules are opened
This ensures secure, enterprise‑grade connectivity.
An on-premises data gateway is software that you install within your local network. It lets you connect directly from your local machine to the cloud. You need an on-premises data gateway version 3000.214.2 or higher to support Fabric pipelines.
2.3.– Create the Fabric mirrored database
In Microsoft Fabric:
Create a Mirrored SQL Server database
Select tables to replicate (up to 1,000)
Provide credentials with required permissions
Start mirroring
Fabric automatically:
Takes an initial snapshot
Continuously tracks changes
Materializes data in OneLake (Delta format)
Exposes a SQL Analytics Endpoint
At this point, data is available — but not yet AI‑ready.
2.4.– Create the semantic model (the most critical step)
Mirroring moves data. Semantic modeling creates meaning.
Without a semantic layer, a Data Agent behaves like a chatbot guessing over raw tables—leading to poor accuracy and low trust.
2.4.1.– Where the semantic model lives in Fabric
For mirrored data coming from Microsoft SQL Server, semantic meaning should be built through a structured layer—not directly on raw tables. This is typically achieved using:
SQL Analytics Endpoint: This is the primary interface in Microsoft Fabric for querying mirrored data. It provides a governed, scalable layer where you can define and expose curated datasets for analytics and AI.
SQL views, functions, and naming conventions: These are used to transform raw tables into business-friendly structures—flattening joins, hiding technical fields, and applying clear, consistent naming. This is where data becomes understandable and usable.
(Optional) Power BI semantic models: With Power BI, you can further enrich the semantic layer by defining measures, relationships, and business logic for reporting and advanced analytics scenarios.
Raw mirrored tables should never be exposed directly to AI tools like Microsoft Copilot, as they lack context and governance—leading to inaccurate or misleading results.
2.4.2.– Create business‑ready semantic views (this is the core!)
Use the SQL Analytics Endpoint to define curated views that:
Use business-friendly terminology
Flatten joins
Hide technical keys
Define the correct level of granularity
Apply core business rules
For example, a view like "semantic.SalesOrders" should be exposed to AI instead of underlying tables such as "dbo.Order"s or "dbo.Customers".
Why this matters:
Data Agents reason at the table/view level
Views define meaning—not just structure
2.4.3.– Organize schemas intentionally
Then, it's important to organize the schemas based on the business purpose. Here I do recommend the following approach:
A structured schema strategy improves both usability and AI performance. For example:
facts.Sales
dimensions.Customer
semantic.ExecutiveSalesOverview
This improves:
Discoverability: Organizing data into clear schemas and business-friendly names makes it much easier for users—and AI tools—to find the right data. Instead of searching through technical tables, they can quickly identify relevant datasets (e.g., “Sales” or “Customer”) in Microsoft Fabric.
AI reasoning and ranking: AI tools like Microsoft Copilot rely on structure and naming to interpret data. Well-defined semantic layers help AI prioritize the correct tables, understand relationships, and generate more accurate answers.
Governance clarity: Clear separation between raw, curated, and semantic layers makes it obvious which data is trusted and approved. This improves control, auditing, and compliance—ensuring only governed data is used for analytics and AI.
2.4.4.– Define grain and time explicitly
AI systems struggle without clear data granularity. Always define:
Explicit date columns
Derived fields (Year, Month)
Clearly named measures (Revenue, Quantity, Units)
This significantly improves answer accuracy and consistency.
2.5.– Create and configure the Fabric Data Agent
The Fabric Data Agent is the layer that turns your curated data into a usable, conversational experience. It acts as the bridge between business users and the semantic model—allowing people to interact with data using natural language instead of SQL or reports.
2.5.1.– What is the role of the Data Agent?
A Data Agent in Microsoft Fabric is an AI-powered interface (leveraging Microsoft Copilot) that:
Translates natural language questions into queries
Uses the semantic layer to interpret business meaning
Returns governed, explainable answers
Importantly, the Data Agent does not access raw data directly—it operates strictly on top of the curated semantic layer you designed earlier.
Why this matters: Without the Data Agent, only technical users can extract value from data. With it, analytics becomes accessible to everyone in the business.
2.5.2.– Why this step is critical in the architecture
This is where the architecture becomes truly valuable. Everything before this step (mirroring, modeling, governance) prepares the data—but the Data Agent is what delivers that value to users.
Its importance comes down to three factors:
Abstraction: Users don’t need to know schemas, joins, or tools
Consistency: Answers are based on governed definitions
Scale: Hundreds or thousands of users can query data simultaneously
In other words, the Data Agent is the consumption layer for AI-driven analytics.
2.5.3.– How business users interact with it
Business users can engage with the Data Agent in a conversational way, for example:
“What was total revenue last quarter?”
“Top 10 customers by sales this year”
“Why did sales drop in March?”
The agent:
Interprets the question using the semantic model
Generates the appropriate query via the SQL Analytics Endpoint
Returns a clear, contextualized answer (often with explanations)
This removes dependency on analysts and dramatically reduces time-to-insight.
2.5.4.– Configuration best practices
To ensure accuracy and trust:
Scope the agent carefully Include only semantic.* views—never raw tables
Define clear instructions Guide the agent with domain-specific rules (e.g., “Revenue = Net Sales”)
Limit ambiguity Avoid exposing overlapping or duplicate datasets
Test with real business questions Validate responses against expected outcomes
These steps directly impact answer quality and user trust.
2.5.5.– Reuse and integration options
One of the key strengths of the Data Agent in Microsoft Fabric is that it is not limited to a single interface—it can be reused and exposed across the enterprise in multiple ways.
Register in Microsoft 365 Agent Catalog:
Publish the Data Agent into Microsoft 365 so it becomes available directly in tools like Microsoft Teams and Microsoft Office applications. This brings analytics into the flow of work, allowing users to interact with governed data without switching tools.
It improves adoption by meeting users where they already operate, increases discoverability of AI capabilities across the organization, and ensures consistent, controlled usage of trusted data sources.
Use as an MCP client (Model Context Protocol):
The Data Agent can also be exposed via MCP to integrate with external AI systems and custom applications, including solutions built on Microsoft Copilot.
This enables organizations to embed governed enterprise data into custom copilots and third-party AI experiences while maintaining a single, consistent semantic layer. It standardizes how AI systems access and retrieves enterprise data, strengthens governance across heterogeneous AI platforms, and extends Fabric data capabilities beyond Microsoft-native applications.
3.– Other ways to ingest data from Microsoft SQL-family
A common misconception when discussing Microsoft Fabric is that SQL Mirroring is the only way to ingest data from databases in the SQL family. While — in my opinion — Mirroring is the most native and opinionated integration pattern Fabric offers for SQL-based systems, it is not the only ingestion mechanism available.
Microsoft Fabric supports multiple data ingestion paths for SQL-family databases, each designed for different architectural goals, latency requirements, and operational constraints. What distinguishes Mirroring is not exclusivity, but intent.
Mirroring is optimized for scenarios where organizations want to continuously replicate operational SQL data into OneLake using CDC-based replication, transforming transactional tables into analytics-ready Delta Lake tables. This approach fully decouples operational workloads from analytical consumption, ensuring that reporting, AI, and exploratory analytics do not compete with mission-critical OLTP systems. Once mirrored, data becomes read-optimized, governed, and immediately accessible across Fabric engines, including Power BI via Direct Lake.
However, Mirroring is not mandatory to bring SQL data into Fabric.
For organizations that require:
Batch ingestion
Complex transformations
Data enrichment or joins across multiple systems
Controlled refresh windows
Or bidirectional data movement
Fabric Data Factory pipelines remain a first-class ingestion mechanism for SQL Server, Azure SQL Database, Azure SQL Managed Instance, and other SQL-based engines. These pipelines provide flexibility and control that Mirroring intentionally abstracts away.
What Fabric does not support for the SQL family is Shortcuts. Shortcuts operate at the storage layer and are designed to virtualize file-based data in object stores such as ADLS or S3. Since SQL databases do not expose data as open files, shortcut-based virtualization is not applicable to SQL engines.
In short, Microsoft Fabric integrates the SQL family via Mirroring—not Shortcuts—when the goal is continuous, low-latency analytics, but it also supports pipeline-based ingestion when architectural control, transformation logic, or operational flexibility are required. Understanding this distinction allows data leaders to choose the right pattern without forcing 4.–a single ingestion model across all workloads.
4.– Real‑World Example related to Financial Planning
Consider an enterprise financial planning scenario where sales data originates in a SQL Server–based system. That data is ingested into Microsoft Fabric—using Mirroring for near‑real‑time replication or pipelines for curated ingestion—and stored as governed Delta tables in OneLake.
A business user asks M365 Copilot:
“Generate a quarterly sales forecast for AMERICAS and schedule a review meeting.”
M365 Copilot acts as the orchestrator and delegates the request across specialized agents. A Fabric Data Agent retrieves real‑time sales figures from OneLake, translating the natural‑language request into secure, policy‑aware queries. A Document Writing Agent, (as example powered by a LLM via the Azure OpenAI service), generates the forecast report using those results. Finally, a Scheduling Agent creates a review meeting by interacting with Microsoft Graph.
The user receives a single, coherent response—forecast included, meeting scheduled—without interacting with dashboards, writing SQL, or understanding the underlying orchestration.
In this model, Microsoft Fabric serves as the data and analytics backbone. SQL systems remain the system of record, while Fabric enables governed, conversational access to that data through agents. The result is faster decision‑making, lower friction for business users, and a scalable foundation for agent‑driven enterprise workflows.
5.– Final takeaway
Mirroring enables AI access. Semantic models enable AI understanding. Data Agents enable AI reasoning.
Together, Microsoft Fabric allows organizations to:
Keep operational data where it belongs
Enable governed, AI‑assisted analytics
Modernize safely, incrementally, and pragmatically