TL;DR

This is Phase 10 of the FSx for ONTAP S3 Access Points serverless pattern library. Building on Phase 9, Phase 10 delivers:

  • FPolicy event-driven integration: ONTAP FPolicy → ECS Fargate TCP server → SQS → EventBridge custom bus. The shared event-ingestion pipeline is verified end-to-end; UC-specific dispatch follows in Phase 11.
  • Multi-account StackSets: All 17 UC templates validated for StackSets compatibility (0 errors) + admin/execution role templates
  • UC-specific alarm profiles: BATCH / REALTIME / HIGH_VOLUME — three profiles with workload-appropriate thresholds
  • Cost optimization: Dynamic MaxConcurrency controller + business-hours scheduling (rate(1h) vs rate(6h))
  • E2E verification: NFSv3 ✅, NFSv4.0 ✅, NFSv4.1 ✅, SMB ✅, NFSv4.2 ❌ (unsupported by ONTAP FPolicy)

In short: Phase 9 completed the operational baseline. Phase 10 builds and verifies the shared event-ingestion pipeline that the pattern library has needed since Phase 1 — without waiting for AWS to ship native S3AP notifications. UC-specific dispatch wiring follows in Phase 11.

📊 Repository stats: 17 industry use cases + event-driven FPolicy + 6 FlexCache/FlexClone patterns | 1,499+ tests | 126 test files | Python 3.12 + CloudFormation (SAM Transform)

Repository: github.com/Yoshiki0705/FSx-for-ONTAP-S3AccessPoints-Serverless-Patterns


1a. Trigger Mode Decision Guide

Before diving into the FPolicy implementation, here is the decision framework for choosing between the three trigger modes this library supports:

Mode Choose when Avoid when
POLLING Hourly or batch processing is acceptable; simplest operating model Sub-minute detection is required
EVENT_DRIVEN Near-real-time ingestion is required and event loss during reconnect is acceptable Compliance requires durable event capture without Persistent Store
HYBRID You need faster detection plus periodic reconciliation to fill gaps You want the simplest operating model
Dimension POLLING EVENT_DRIVEN HYBRID
Detection latency Minutes to hours Seconds Seconds + periodic catch-up
Monthly cost (infra only) ~$6-21 ~$32-60 ~$42-86
Operational complexity Low High Highest
Event durability High (full scan each time) Medium (gap during restart) High (reconciliation fills gaps)
ONTAP dependency None (S3 AP only) High (FPolicy config) High

Decision flow:

  1. Real-time detection not required → POLLING (start here for most workloads)
  2. Real-time required + Persistent Store available (ONTAP 9.14.1+) → EVENT_DRIVEN
  3. Real-time required + no Persistent Store → HYBRID (polling fills gaps)

Full guide: Trigger Mode Decision Guide


1. FPolicy Event-Driven Architecture

Background: why FPolicy

Every UC in this pattern library runs on a polling model: EventBridge Scheduler → Discovery Lambda → ListObjectsV2. This works, but it means latency is bounded by the polling interval (typically 1 hour). AWS still does not support GetBucketNotificationConfiguration for S3 Access Points attached to FSx for ONTAP volumes (FR-2 remains open).

ONTAP FPolicy is a file-operation notification framework built into every ONTAP system. In external server mode, it sends TCP notifications for create/write/delete/rename events to a registered server. By connecting this to AWS services, we get near-real-time event-driven processing without waiting for FR-2.

This implementation builds on Shengyu Fang's reference implementation, adapted for the 17-UC pattern library architecture.

S3 API Does Not Remove File-System Semantics

S3 Access Points for FSx for ONTAP expose file data through S3 APIs, but authorization is a two-layer model:

  1. AWS-side authorization: IAM identity-based policy, S3 Access Point resource policy, VPC endpoint policy, SCP — all relevant policies are evaluated and all must permit the request
  2. File-system-side authorization: The file system identity (UNIX UID or Windows domain\user) associated with the access point determines what file operations are authorized based on that user's permissions on the underlying volume

This means that least-privilege design must cover both AWS IAM and ONTAP file permissions. A common mistake is securing only the IAM layer while using a root-equivalent file system identity (UID 0), which grants full access to all files regardless of IAM restrictions.

Key behaviors:

  • If the file system user has read-only access, write requests through the access point are blocked — even if IAM permits s3:PutObject
  • Attaching an S3 access point does not change the volume's behavior when accessed via NFS or SMB
  • Block Public Access is always enabled and cannot be changed for FSx for ONTAP access points

For the full authorization model documentation, see S3AP Authorization Model.

Architecture

FSx ONTAP SVM (file operations: create/write/delete/rename)
│
│ TCP (port 9898, async mode)
▼
FPolicy External Server (ECS Fargate, ARM64 Python 3.12)
│
├─ [Near-real-time] → SQS Ingestion Queue
│                        │
│                        │ Event Source Mapping
│                        ▼
│                     Bridge Lambda → EventBridge Custom Bus
│                                          │
│                                   UC1 reference rule (Phase 10)
│                                          │
│                                   UC1 Step Functions
│
│                                   ── Phase 11 ──
│                                   UC-specific dispatch rules
│                                   → Step Functions / Lambda (per-UC)
│
└─ [Batch] → JSON Lines log (FSxN S3AP) → Log Query Lambda
Enter fullscreen mode Exit fullscreen mode

ONTAP initiates the TCP connection to the FPolicy server — not the other way around. This means the server simply listens on a port. Because ONTAP maintains a persistent TCP control channel with keep-alive, Lambda is not viable (15-minute timeout). ECS Fargate provides the long-running TCP listener without OS management overhead.

Why not NLB?

Initial design placed an NLB in front of Fargate for IP stability. In our AWS verification, the NLB path established a TCP connection but the FPolicy handshake did not complete.

Additional verification (2026-05-14): We tested both preserve_client_ip.enabled=true and false on the NLB target group. In both configurations, ONTAP did not establish an FPolicy session through the NLB. The only connections observed from the NLB IP were health checks (TCP connect → immediate close at 10-second intervals). No FPolicy NEGO_REQ was received via the NLB path.

One plausible explanation is that ONTAP FPolicy's external-engine expects a direct TCP connection to the primary-servers IP. When the NLB forwards the connection to a Fargate task with a different IP, the FPolicy session establishment conditions are not met — possibly because ONTAP validates the connection endpoint or because the NLB's connection lifecycle (idle timeout, deregistration delay) interferes with the persistent control channel that FPolicy requires.

This remains documented as an observed deployment limitation in our environment (FSxN ONTAP 9.17.1P6, internal NLB with IP targets), not a universal NLB claim. If your environment differs, testing the NLB path is straightforward — set the NLB IP as the external-engine primary-servers and check vserver fpolicy show-engine for connection state.

Solution: Fargate task direct IP connection. IP stability is handled by an EventBridge-triggered Lambda that updates the ONTAP external-engine configuration when the Fargate task IP changes:

ECS Task State Change (RUNNING) → EventBridge Rule → IP Updater Lambda
→ ONTAP REST API: disable policy → update engine primary_servers → enable policy
Enter fullscreen mode Exit fullscreen mode

The direct-IP model assumes a single active Fargate task (DesiredCount: 1) and requires network reachability from the FSxN SVM data LIFs to the task ENI on the FPolicy TCP port. This design prioritizes connection stability over horizontal scalability; multi-task active-active configurations are not supported due to FPolicy session constraints. Security groups must allow ONTAP-initiated inbound connections on port 9898. During Fargate task restarts, event handling depends on the FPolicy policy's is-mandatory setting: with is-mandatory=false (our configuration), file operations continue unblocked but notifications are dropped until the new task connects. See the Event durability note below for Persistent Store guidance.

TriggerMode parameter

Phase 10 introduces the TriggerMode parameter scaffolding and verifies the shared FPolicy → SQS → EventBridge pipeline end-to-end. A reference implementation is deployed in the legal-compliance (UC1) template. UC-specific Step Functions dispatch rules are intentionally deferred to Phase 11.

Value Phase 10 behavior
POLLING (default) Existing EventBridge Scheduler + Discovery Lambda
EVENT_DRIVEN Shared FPolicy event pipeline enabled; UC-specific dispatch wiring is Phase 11
HYBRID Polling remains active; event-driven deduplication path prepared for Phase 11

Default POLLING ensures zero impact on existing deployments.

NFSv3 write-complete delay

When FPolicy fires a notification, the file write may not be complete — particularly with NFSv3 which lacks close semantics. The server inserts a configurable delay (WRITE_COMPLETE_DELAY_SEC, default 5s) after receiving NOTI_REQ, and Step Functions include retry logic for incomplete files.

Event durability note

This Phase 10 implementation is designed for near-real-time processing, not end-to-end durable event capture during Fargate task restarts. With is-mandatory=false, ONTAP drops notifications when no FPolicy server is connected — file operations continue unblocked but events are lost. Environments that cannot tolerate event loss should evaluate ONTAP FPolicy Persistent Store (ONTAP 9.14.1+), available for asynchronous non-mandatory external FPolicy policies. Persistent Store queues events on the SVM during server disconnection and can replay them when the external server reconnects. Note that queue sizing, replay handling, and deduplication require application-level design. This is a Phase 11+ candidate (design-dependent).

Note (Phase 12 update): This Phase 10 article documents the initial event-durability boundary. Persistent Store replay validation is covered in Phase 12, where replay behavior was tested for 5-event and 20-event disconnect scenarios with zero event loss confirmed. Use the Deployment Profiles guide to choose the appropriate durability level for your workload.

Deployment Profiles — From PoC to Compliance

The event-driven FPolicy pattern supports three deployment profiles, each with clear boundaries for event loss tolerance and operational complexity:

Dimension PoC/Demo Production Compliance-sensitive
FPolicy Server Fargate (direct IP) EC2 static IP or NLB EC2 static IP + NLB
is-mandatory false true (ONTAP 9.15.1+) true (ONTAP 9.15.1+)
Persistent Store Not required Recommended Required (ONTAP 9.14.1+)
Retry / Dedup Best-effort DynamoDB idempotency DynamoDB + S3 Object Lock lineage
Alarm Profile Minimal (error only) Full (latency + error + backlog) Full + audit trail
Event Loss Tolerance Acceptable (30-60s gap) Near-zero (retry compensates) Zero (Persistent Store + audit)

Key design decisions:

  • is-mandatory=true (ONTAP 9.15.1+): Blocks file operations when the FPolicy server is unavailable — prevents event loss but impacts availability. Use only with redundant server deployment.
  • Persistent Store (ONTAP 9.14.1+): Buffers events in a dedicated SVM volume during server disconnection. Events are replayed in order upon reconnection. Sizing: 1 GB ≈ 2M events at ~500 bytes each.
  • Replay recovery time: 100K buffered events at 100 events/sec = ~17 minutes to catch up.

The progression path is incremental: PoC → Production → Compliance-sensitive, adding capabilities at each stage without redesigning the core architecture.

Full profile documentation: Deployment Profiles


2. E2E Verification Results

Protocol support matrix

NFS Version Mount Option FPolicy NOTI_REQ Result
NFSv3 vers=3 ✅ Immediate Works
NFSv4.0 vers=4.0 ✅ Immediate Works
NFSv4.1 vers=4.1 ✅ Immediate Works
NFSv4.2 vers=4.2 ❌ Not sent Unsupported
NFSv4 (auto) vers=4 ❌ Not sent Negotiates to 4.2
SMB/CIFS Works

Key finding: mount -o vers=4 on modern Linux negotiates to NFSv4.2, which ONTAP FPolicy does not support. Always use vers=4.1 explicitly. This is documented in NetApp's FPolicy Auditing FAQ.

ONTAP version note: NFSv4.1 FPolicy monitoring support was introduced in ONTAP 9.15.1. Earlier versions support SMB, NFSv3, and NFSv4.0 only. Our test environment runs ONTAP 9.17.1P6, which includes NFSv4.1 support. See NetApp FPolicy event configuration documentation for the full protocol support matrix by ONTAP version.

Path extraction bug fix

ONTAP sends file paths in XML format within NOTI_REQ:

<PathNameType>WIN_NAME</PathNameType><PathName>\file.txt</PathName>
Enter fullscreen mode Exit fullscreen mode

The initial regex extraction left residual XML tags in the file_path field. Fixed by adding an _extract_xml_value() helper with multi-tag fallback and residual tag stripping.

Before fix:

{"file_path": "<PathNameType>WIN_NAME</PathNameType><PathName>\\file.txt</PathName>"}
Enter fullscreen mode Exit fullscreen mode

After fix:

{"file_path": "file.txt"}
Enter fullscreen mode Exit fullscreen mode

volume_name / svm_name resolution

ONTAP's NOTI_REQ body does not always include volume and SVM names in a parseable location. Resolution strategy:

  1. Extract from NEGO_REQ session context (SVM name available at handshake)
  2. Fall back to environment variables (SVM_NAME, VOLUME_NAME) set in the ECS task definition

Complete E2E flow (verified)

NFSv3 file create (tee /mnt/fsxn/file.txt)
→ ONTAP FPolicy NOTI_REQ
→ Fargate FPolicy Server receives event
→ SQS SendMessage
→ Bridge Lambda → EventBridge Custom Bus
Enter fullscreen mode Exit fullscreen mode

Actual EventBridge event:

{
  "detail-type": "FPolicy File Operation",
  "source": "fsxn.fpolicy",
  "detail": {
    "event_id": "2175e878-1e0c-48ef-a8b3-53664d5d5b06",
    "operation_type": "create",
    "file_path": "test-eb-e2e-1778707951.txt",
    "volume_name": "vol1",
    "svm_name": "FSxN_OnPre",
    "timestamp": "2026-05-13T21:32:37.680626+00:00",
    "client_ip": "10.0.10.67"
  }
}
Enter fullscreen mode Exit fullscreen mode

3. Unified UC Directory Structure

Phase 10 introduces event-driven-fpolicy/ as a first-class shared pattern directory, using the same structure as the UC directories. It is not counted as one of the 17 industry UCs — it is a shared event-ingestion reference implementation that any UC can consume via EventBridge rules.

event-driven-fpolicy/
├── docs/                    # 8 languages (ja, en, ko, zh-CN, zh-TW, fr, de, es)
│   ├── architecture.md      # + .en.md, .ko.md, etc.
│   └── demo-guide.md
├── functions/
│   ├── ip_updater/          # Fargate IP → ONTAP REST API
│   └── sqs_to_eventbridge/  # Bridge Lambda
├── schemas/
│   └── fpolicy-event-schema.json
├── server/
│   ├── Dockerfile           # ARM64 Python 3.12
│   ├── fpolicy_server.py    # TCP listener + SQS sender
│   └── requirements.txt
├── tests/
├── README.md                # + 7 language variants
├── template.yaml            # Fargate deployment (ComputeType=fargate)
└── template-ec2.yaml        # EC2 deployment (ComputeType=ec2)
Enter fullscreen mode Exit fullscreen mode

A single template.yaml with a ComputeType parameter (fargate/ec2) uses CloudFormation Conditions to select the appropriate resource set. The EC2 variant uses a t4g.micro with a static private IP — no IP update Lambda needed — at roughly ~$4/month. The Fargate variant avoids EC2 management but requires task-IP tracking and has a higher baseline cost (~$10/month for Fargate compute alone, plus VPC Endpoints). Actual cost varies by region, runtime hours, and VPC Endpoint configuration.


4. Multi-Account StackSets

StackSets compatibility validator

New validator scripts/check_stacksets_compatibility.py checks all 17 UC templates for:

  1. Hardcoded Account IDs — 12-digit numeric strings that would break in other accounts
  2. Resource name uniqueness — names must include !Sub with AccountId or StackName
  3. Export name collisions — exports that would conflict across accounts
  4. VPC/Subnet/SecurityGroup parameterization — must not be hardcoded

Result: 17/17 templates, 0 errors, 0 warnings.

StackSets role templates

Template Purpose
shared/cfn/stacksets-admin.yaml Admin account role for StackSet management
shared/cfn/stacksets-execution.yaml Target account execution role (least-privilege)

The execution role uses an Organization ID condition in its trust policy — accounts outside the Organization cannot assume it. Permissions are scoped to Lambda, Step Functions, DynamoDB, S3, CloudWatch, EventBridge, SNS, and Secrets Manager only.

Automatic deployment

With AutoDeployment: Enabled on the StackSet, new accounts joining the Organization automatically receive the UC templates. No manual intervention required.

Scope note: Phase 10 validates that templates can be distributed safely across accounts via StackSets (deployment compatibility). It does not yet validate cross-account FSxN S3AP data access, shared VPC ownership, or centralized operations across accounts. Those runtime cross-account patterns are Phase 11+ work.


5. Alarm Profiles and Cost Optimization

UC-specific alarm profiles

Not all UCs have the same latency requirements. A batch genomics pipeline (UC3) tolerates higher failure rates than a real-time compliance monitor (UC12). Phase 10 introduces three profiles:

Profile Failure Rate Threshold Error Threshold Target Workloads
BATCH 10% 3/hour Periodic batch processing (UC1-5, UC9)
REALTIME 5% 1/hour Real-time processing (UC10-14)
HIGH_VOLUME 15% 5/hour High-volume file processing (UC6-8, UC15-17)

Each UC template now has an AlarmProfile parameter (BATCH / REALTIME / HIGH_VOLUME / CUSTOM). The CUSTOM option exposes CustomFailureThreshold and CustomErrorThreshold for fine-grained control.

Dynamic MaxConcurrency controller

shared/max_concurrency_controller.py calculates optimal Map state parallelism based on actual file volume:

def calculate_max_concurrency(
    detected_file_count: int,
    ontap_rate_limit: int = 100,
    api_calls_per_file: int = 3,
    upper_bound: int = 40
) -> int:
    optimal = min(
        detected_file_count,
        ontap_rate_limit // api_calls_per_file,
        upper_bound
    )
    return max(optimal, 1)
Enter fullscreen mode Exit fullscreen mode

This replaces the static MaxConcurrency: 10 from Phase 8. For 500 files with default settings, it calculates min(500, 33, 40) = 33 — a 3.3x throughput improvement without exceeding ONTAP's rate limit.

Business-hours cost scheduling

With EnableCostScheduling=true, two EventBridge Schedulers dynamically adjust the polling frequency:

Time Period Schedule
Business hours (weekday 09:00-18:00 JST) rate(1 hour)
Off-hours (weekday 18:00-09:00 + weekends) rate(6 hours)

BusinessHoursStart and BusinessHoursEnd parameters allow customization. The Cost Scheduler emits an EstimatedMonthlySavings CloudWatch metric for visibility.

S3 Access Points Performance Considerations

Key performance characteristics (from AWS documentation):

  • Latency: Tens of milliseconds (consistent with S3 bucket access)
  • Throughput: Depends on the FSx file system's provisioned throughput capacity — S3 AP, NFS, and SMB all share the same throughput pool
  • Object size limit: 5 GB for uploads (PutObject); downloads (GetObject) can be larger
  • Storage class: FSX_ONTAP only; SSE-FSX encryption only

Design implications for serverless pipelines:

  1. Lambda memory → network bandwidth: Higher Lambda memory allocates more network bandwidth. For 10 MB file processing, 1,769 MB (1 vCPU) provides ~600 Mbps.
  2. Step Functions Map concurrency: Limit MaxConcurrency based on FSx provisioned throughput. Formula: fsxn_throughput / per_lambda_throughput. Example: 512 MBps ÷ 50 MBps per Lambda ≈ 10 concurrent executions.
  3. ListObjectsV2 pagination: MaxKeys=1000 per page. For 10,000 files = 10 pages × ~50ms = ~500ms minimum. Use Prefix filtering to reduce scope.
  4. Shared throughput: S3 AP, NFS, and SMB all share the same FSx throughput capacity. Account for existing NFS/SMB workloads when sizing Map concurrency.
  5. Retry strategy: Use botocore.config.Config(retries={"mode": "adaptive"}) for automatic backoff on SlowDown (503) responses.

Full analysis: S3AP Performance Considerations


6. Test Results

Category Count Result
Phase 10 new tests 62 All PASS ✅
Property-based tests (Hypothesis) 7 properties × 100-200 iterations All PASS ✅
Existing tests (Phase 1-9) 982 No regressions ✅
Total 1044+ All PASS

Property-based tests

Property What it verifies
FPolicy event round-trip Serialize → deserialize produces equivalent object
MaxConcurrency bounds Result always ≥ 1 and ≤ upper_bound
MaxConcurrency correctness Result matches the min() formula
Zero files → 1 Empty input never produces 0
StackSets Account ID detection Known violations are always caught
Cost savings non-negativity Estimated savings ≥ 0 for all inputs
Same rate → ~0 savings Equal business/off-hours rates produce near-zero savings

Validator results

Validator Result
check_s3ap_iam_patterns.py 17/17 clean ✅
check_handler_names.py 87 handlers, 0 issues ✅
check_conditional_refs.py 17 templates, 0 issues ✅
check_stacksets_compatibility.py 17 templates, 0 errors ✅
_check_sensitive_leaks.py 160 images, 0 leaks ✅
cfn-guard IAM security Advisory, 0 new violations ✅

7. Deployment Learnings

Several issues surfaced during AWS verification that are worth documenting:

Issue Root Cause Fix
NLB path: FPolicy handshake fails ONTAP FPolicy expects direct TCP to primary-servers IP; NLB target routing does not satisfy session establishment (tested with preserve_client_ip true and false) Direct Fargate IP + EventBridge IP auto-update
jsonschema 4.18+ fails on ARM64 Lambda rpds-py native dependency Pin to 4.17.x
SCHEMA_PATH differs between Lambda and local Different working directories Fallback path resolution
Guard Hook rejects Condition-based Resource: "*" Overly strict rule Updated rule to allow Condition exists
ECR pull fails in private subnet Missing VPC Endpoints Added ECR, STS, S3, Logs, SQS endpoints
KEEP_ALIVE timeout race Server timeout = keep_alive_interval Increased to 300s
NFSv4 events not firing vers=4 negotiates to unsupported 4.2 Explicit vers=4.1

7a. Beyond AI/ML — Enterprise Workload Examples

This pattern is not limited to AI/ML demos. The S3 Access Points architecture applies to any enterprise file data on FSx for ONTAP:

  • SAP peripheral files and exported business documents — IDoc exports, ABAP report outputs, BW data extracts. Process without changing SAP file interfaces.
  • EDI / HULFT landing zones — Automatic validation and format conversion of received files. No changes to existing EDI/HULFT infrastructure.
  • Audit evidence and compliance reports — Periodic integrity checks, retention management, with NTFS permissions preserved.
  • Batch output from EC2-based business applications — Add serverless post-processing pipelines without changing application output paths.
  • Scanned documents and regulated records — OCR, classification, PII detection on documents stored for long-term retention.

The design principle: File data stays on FSx for ONTAP. S3 Access Points provide the bridge to AWS-native automation, AI/ML, and analytics services — without data movement, without changing existing NFS/SMB access patterns. Existing backup (SnapMirror), DR, and access controls remain unchanged.

This positioning matters for partner and SI proposals: the value is not "replace your file server" but "connect your existing file data to AWS services without migration."

Full examples with architecture diagrams: Enterprise Workload Examples


8. Next Phase Outlook

Phase 10 established the shared event-ingestion pipeline (FPolicy → SQS → EventBridge). Phase 11 will wire those events into UC-specific processing. Candidates:

  1. TriggerMode rollout to all 17 UCs: Expand the reference implementation from UC1 to all templates, with UC-specific EventBridge dispatch rules
  2. FPolicy → UC-specific Step Functions dispatch: EventBridge rules matching file path prefixes/extensions to UC targets
  3. protobuf format evaluation: ONTAP 9.15.1+ supports protobuf for higher-performance notifications
  4. Cross-Account Observability live verification: Deploy the shared-services-observability template and validate metric aggregation
  5. Persistent Store evaluation: Phase 11+ design-dependent work for compliance-sensitive environments that cannot tolerate event loss during Fargate task restarts
  6. FR-2 migration path: When AWS ships native S3AP notifications, the TriggerMode parameter provides a clean migration — switch from EVENT_DRIVEN to native events without changing UC logic

Why Native S3AP Notifications Still Matter

This FPolicy-based pipeline proves that customers need event-driven processing for FSx for ONTAP S3 Access Points. However, it also quantifies where a native AWS-managed notification feature would eliminate undifferentiated heavy lifting:

Operational burden Current (FPolicy) With native notifications
Long-running TCP listener Fargate 24/7 (~$30-50/month) Not needed
Fargate task IP tracking IP Updater Lambda + ONTAP REST API Not needed
ONTAP external-engine reconfiguration On every deployment Not needed
FPolicy protocol dependency NFSv4.2 not supported Protocol-independent
Event durability semantics Requires Persistent Store (ONTAP 9.14.1+) S3-equivalent at-least-once
Cross-account event routing SQS → Bridge Lambda → EventBridge → cross-account Standard EventBridge rules

Implementation complexity: 15-20 CloudFormation resources and 2 Lambda functions (IP Updater + Bridge) for FPolicy, vs an estimated 3-5 resources for native EventBridge integration.

The FPolicy implementation is not a replacement for native S3AP notifications — it is evidence of customer demand and an interim event-driven pattern. The operational complexity documented here directly maps to the value a native feature would deliver.

Full analysis: Native S3AP Notifications Evidence

Partner/SI Delivery Checklist

For partners and SIs proposing this pattern to enterprise customers, a structured delivery checklist is available covering:

  1. Customer workload classification — SAP-adjacent / file server / regulated records / AI analytics
  2. Trigger mode selection — POLLING / EVENT_DRIVEN / HYBRID based on latency and durability requirements
  3. Deployment profile — PoC / Production / Compliance-sensitive with clear boundaries
  4. Access model design — IAM + S3 AP policy + ONTAP file permissions (dual-layer)
  5. Network model — Private VPC / VPC Origin AP / Cross-Account / Shared Services
  6. Operating model — Customer-operated / partner-operated / managed service
  7. Success criteria — Latency, throughput, cost, auditability, recovery behavior

The checklist also includes a 4-phase PoC implementation guide (environment prep → POLLING verification → EVENT_DRIVEN verification → evaluation) and FAQ for common partner questions.

Full checklist: Partner/SI Delivery Checklist


Who should care about Phase 10?

  • Platform teams get an event-driven alternative to polling — near-real-time latency instead of hourly polling intervals
  • Security teams get StackSets compatibility validation ensuring no hardcoded account IDs leak across environments
  • Operations teams get workload-appropriate alarm thresholds that reduce alert fatigue
  • Finance teams get fewer off-hours polling invocations through business-hours scheduling, with savings surfaced as a CloudWatch metric
  • Storage teams get a documented FPolicy integration pattern with protocol-level verification results
  • Multi-account teams get ready-to-deploy StackSets admin/execution roles with Organization-scoped trust
  • Partners and SIs get a PoC-ready event-driven alternative for customers who cannot wait for native S3AP notifications
  • Regulated workload owners get a clear event-durability boundary: near-real-time by default, Persistent Store required when event loss is unacceptable
  • SAP / ERP teams get a pattern for connecting peripheral files (IDoc, HULFT, batch output) to AWS AI/analytics without changing existing file interfaces

Conclusion

Phase 10 solves the problem that has been deferred since Phase 1: how do you get event-driven processing from FSx for ONTAP when S3AP native notifications don't exist?

The answer is ONTAP FPolicy — a mature notification framework that predates S3 Access Points by over a decade. By connecting it to ECS Fargate → SQS → EventBridge, Phase 10 established the shared event-ingestion pipeline and the TriggerMode parameter foundation needed to support polling, event-driven, and hybrid modes. UC-specific dispatch remains the main Phase 11 focus. The default remains POLLING, so existing deployments are unaffected.

The E2E verification confirmed that NFSv3, NFSv4.0, NFSv4.1 (ONTAP 9.15.1+), and SMB all work. NFSv4.2 does not — and the most common failure mode is mount -o vers=4 silently negotiating to 4.2. This is now documented and the setup guide recommends explicit version pinning.

Beyond FPolicy, Phase 10 matures the operational model: StackSets deployment compatibility for multi-account distribution, alarm profiles for workload-appropriate monitoring, and cost scheduling for environments that don't need 24/7 polling. Combined with the 6-validator CI pipeline and 1044+ passing tests, the pattern library is ready for production-style multi-account template distribution, while runtime cross-account data-path validation remains Phase 11+ work.


Design Guides

The following design guides have been added to the repository:

Document Description
S3AP Authorization Model Dual-layer authorization (IAM + file system)
Deployment Profiles PoC / Production / Compliance-sensitive
Trigger Mode Decision Guide POLLING / EVENT_DRIVEN / HYBRID
Enterprise Workload Examples SAP, EDI, audit, batch output
S3AP Performance Throughput, Lambda sizing, concurrency
Native Notifications Evidence Feature request evidence
Partner/SI Delivery Checklist Partner/SI proposal and delivery guide

Repository: github.com/Yoshiki0705/FSx-for-ONTAP-S3AccessPoints-Serverless-Patterns


Update Note

This article describes the Phase 10 baseline — the first verified shared FPolicy ingestion pipeline. The event-driven pipeline is expanded across all 17 UCs in Phase 11 and operationally hardened in Phase 12 with Persistent Store replay validation, SLO observability, capacity guardrails, and secrets rotation. Phase 13 adds FlexClone/FlexCache serverless automation.

Use the FPolicy event-driven mode for PoC and near-real-time ingestion. For regulated or compliance-sensitive workloads, evaluate Persistent Store, replay handling, deduplication, and operational runbooks before treating the pipeline as durable. See Deployment Profiles for guidance.


Previous phases: Phase 1 · Phase 7 · Phase 8 · Phase 9

Next phases: Phase 11 (UC-specific dispatch) · Phase 12 (Persistent Store replay + SLO hardening) · Phase 13 (FlexClone/FlexCache automation)


📢 Update (2026-05-23)

This article is part of the FSx for ONTAP S3 Access Points series.
The latest addition — Phase 13: From Serverless Patterns to Field-Ready Reference Architecture — is now available:

👉 Read Phase 13

Phase 13 adds FlexCache/FlexClone serverless automation, split-path S3AP monitoring, SLO runbooks, Partner/SI delivery checklist, and a complete field-ready baseline for informed evaluation.