Home
Resources
iGaming cloud infrastructure: architecture, performance, and compliance guide

IT Infrastructure

iGaming cloud infrastructure: architecture, performance, and compliance guide

DevOps and Cloud Architecture Expert Co-founder of Gart

July 1, 2026

The global online gambling and sports betting market is projected to surpass $126 billion by 2027, and the platforms competing for that revenue routinely support more than 50,000 concurrent players during peak events. That scale creates a narrow engineering problem: an iGaming cloud infrastructure has to deliver sub-30-millisecond response times, survive a 40x traffic spike during a World Cup final, and simultaneously prove to four or five regulators that player data never left an approved border. Most general-purpose cloud architectures are built for one or two of those constraints. iGaming platforms need all three at once.

This guide walks through how modern iGaming operators actually build for that combination — compute and network topology, stateful WebSocket scaling, database concurrency control, and the jurisdictional rules that shape where every byte of player data can physically sit. It draws on current AWS, OVHcloud, Continent 8, and Google Cloud reference architectures, alongside the statutory frameworks operators must satisfy in Malta, Germany, Brazil, and New Jersey.

TL;DR

Real-time betting and live-dealer games require sub-30ms round-trip latency, which pushes core transactional logic onto single-tenant bare-metal servers rather than shared public cloud instances.
WebSocket-based session persistence needs sticky routing, consistent hashing, and a pub/sub layer (Redis or Kafka) to synchronize state across edge nodes.
Database layers combine ACID-compliant engines (PostgreSQL, MySQL InnoDB) for ledgers with MVCC for read-heavy audit paths and in-memory stores for session state.
Malta, Germany, Brazil, and New Jersey each impose different physical server localization and data residency rules — there is no single compliant architecture that works everywhere.
Hybrid edge appliances (AWS Outposts, Google Distributed Cloud) let operators keep regulated workloads on sovereign hardware while running CDN and analytics in the public cloud.

Why standard cloud architectures fall short for iGaming

In live sports betting, a 500-millisecond delay is a financial vulnerability, not just a UX inconvenience. It opens an arbitrage window for “court-siding” — placing bets on events that have already concluded by exploiting broadcast delay. That single constraint reshapes the entire infrastructure decision tree.

Traditional multi-tenant public cloud environments introduce the “noisy neighbor” effect: virtualized workloads sharing physical hardware cause unpredictable jitter and round-trip time spikes. For a game engine calculating live odds or generating random numbers under regulatory audit, that unpredictability is unacceptable. This is why iGaming operators consistently isolate core transactional and game-logic engines on single-tenant bare-metal servers or private clouds, frequently orchestrated through OpenStack-based hypervisors and custom management APIs that avoid the resource contention inherent to shared infrastructure.

The hardware baseline

The table below summarizes the technical profile operators typically specify for compute nodes running RNG, betting logic, and ledger transactions.

Component	Typical configuration	Why it matters
Compute	Dual Intel Xeon Scalable Silver/Gold, high base clock	Minimizes RNG and betting-logic execution time; avoids CPU scheduler queue delays
Memory	96–256 GB DDR4/DDR5 ECC RAM	Real-time correction of single-bit memory errors; prevents crashes during peak load
Storage	Dual NVMe SSDs in RAID	Sustains high concurrent write IOPS without thread stalls
Network	Dual-bonded 100–200 Gbps NICs, Tier-1 peering	Maximizes burst capacity, minimizes network hops
Colocation	Tier III+/IV facilities, N+1 or 2N power redundancy	Supports up to 99.993% uptime; isolates localized power failures

The cost trade-off matters as much as the technical spec. Public cloud egress charges can reach $0.09 per gigabyte on platforms like AWS, and that adds up fast when a live match generates continuous odds-update traffic to tens of thousands of sockets. Dedicated server pricing is predictable month over month — which is exactly the property operators need when a single high-volume event can otherwise erode margin through unplanned cloud consumption.

Stateful sessions: scaling WebSockets without losing game state

Live odds, live-dealer video, and real-time game state depend on persistent, bidirectional connections. Standard HTTP request-response cycles carry too much overhead — repeated TCP handshakes and verbose headers — for that job, so platforms upgrade to WebSocket: a single full-duplex TCP socket established once and held open for the duration of play.

Client Server

|——– HTTP GET /ws (Upgrade: websocket) ——>| [Initial Handshake]

|<------- HTTP 101 Switching Protocols ————| [Protocol Upgraded]

| |

|================= ESTABLISHED TCP STREAM ================|

|<------- Binary/Text Frames (client frames XOR-masked)| [Active Game Play]

|<------- Ping / Pong Heartbeats ———————>| [Idle Maintenance]

Every frame sent from client to server is masked with a 32-bit XOR key to prevent frame-injection attacks, and platforms typically apply permessage-deflate compression to shrink repetitive JSON payloads — while leaving control frames like ping/pong/close uncompressed to protect connection stability.

The harder problem is scaling this statefully. Because each socket holds an active player session in memory, you cannot route a reconnecting player to an arbitrary backend node the way you would with stateless HTTP. Operators typically combine four techniques:

Sticky sessions with IP hashing — Nginx’s ip_hash (or equivalent) maps a client to the same backend node on reconnect, avoiding expensive cross-server state sync.
Consistent hashing — minimizes the percentage of session keys that must be remapped when application shards scale up or down, preventing hotspots.
Distributed pub/sub — a message broker (Redis or Apache Kafka) propagates state changes instantly across edge nodes, since players in the same game room are often connected to different physical servers.
Connection pooling and keep-alives — heartbeat frames at fixed intervals prevent silent termination by intermediary infrastructure (ALBs, Cloudflare edges, ISP firewalls), with exponential-backoff reconnection logic on the client side to preserve gameplay state after a drop.

5G rollout has pushed the achievable latency floor down to under 10 milliseconds, versus roughly 200ms on legacy 4G — which is why real-time multiplayer engines and fast-action sports betting now target sub-30ms round-trip time, well below the 100ms threshold that’s acceptable for standard casino systems.

Database architecture: concurrency control for the ledger

The database layer is the system of record for account balances, open wagers, and transaction history, and it’s where most iGaming platforms actually fail under load — not at the network edge. Maintaining integrity at scale requires a hybrid design: ACID-compliant relational databases (PostgreSQL, MySQL InnoDB) for ledger entries and balances, in-memory key-value stores (Redis) for session state, and document databases (MongoDB) for telemetry.

Without strict concurrency control, simultaneous writes to the same row — a player spinning a slot while a deposit posts — produce dirty reads, lost updates, and phantom reads. There are three concurrency paradigms in active use, each with a distinct trade-off profile:

Paradigm	Mechanism	Best for	Trade-off
Pessimistic (PCC)	Exclusive/shared locks (e.g., `SELECT FOR UPDATE`) acquired upfront	Ledger balances, jackpot pools, payment processing	Guarantees integrity under contention; higher latency from lock queuing and deadlock risk
Optimistic (OCC)	Transactions proceed lock-free on private copies, version-checked at commit	Low-contention profile updates, config lookups	Minimal latency at low contention; “rollback storms” under high write contention
MVCC	Writes create new timestamped row versions; readers never block writers	High-frequency reads, ledger audits, session lookups	Eliminates read-write contention; increases storage overhead and vacuuming load

Under extreme write contention — a jackpot pool adjustment or a burst of bets on one live match — the probability of a rollback under optimistic concurrency control can be modeled statistically as a function of the transaction arrival rate (λ), the hold time of the transaction (t), and the degree of resource overlap (d):

P(rollback) = 1 − e^(−λtd)

As any of those three variables rises, rollback probability climbs toward 1 — which is exactly why pure OCC is a poor fit for jackpot pools but works fine for low-contention profile edits.

To avoid write-locking bottlenecks entirely during peak load, more advanced platforms implement lock-free reservation systems: instead of locking a balance row for the duration of a transaction, the application registers an atomic “intent to change” (a reservation), and defers the actual row update and lock acquisition to commit time. This keeps transaction ingestion flowing without exhausting the thread pool.

High-availability targets of 99.99% uptime also require real-time replication, and operators choose between two models depending on their risk tolerance:

Synchronous replication writes to primary and replica simultaneously — zero data loss (RPO of zero), but every write pays a network round-trip.
Asynchronous replication commits locally first and propagates after — lower write latency, but a small replication lag window where a primary failure could lose data.

Managed engines like Amazon Aurora or Amazon RDS combine both properties reasonably well: Aurora natively replicates across Availability Zones, so a secondary can be promoted to primary within seconds after a localized failure, preserving transaction state without manual intervention.

Multi-jurisdictional compliance: the real constraint on architecture

This is where iGaming infrastructure diverges most sharply from standard SaaS cloud design. Licensing regulators don’t just require “a cloud provider” — they mandate the physical location of servers, how data replicates, and what security certifications the environment carries. Four representative jurisdictions illustrate how differently these rules are written.

Jurisdiction	Server localization	Data residency & replication	Compliance controls
Malta (MGA)	EU/EEA or approved third country.	Requires system traceability and audit trails accessible to MGA on demand.	ISO 27001 focus; risk-based supervision calibrated to licensee profile.
Germany (GGL)	Databases/servers must be physically located within the EU/EEA.	Mandatory integration with LUGAS (deposit tracking) and OASIS (self-exclusion).	Host-provider enforcement prioritized; IP-blocking of access providers ruled unlawful (BVerwG 8 C 3.24).
Brazil (SPA)	ISO 27001-certified data centers within Brazil required.	Real-time reporting to SIGAP; daily logs of identity, financial flows, and betting history.	Mandatory “Face Match” biometrics; BRL 5M financial reserve; 12% GGR tax.
New Jersey (DGE)	Primary gaming/RNG equipment must reside in Atlantic City or DGE-approved facilities.	Geofencing mandatory; wide-area systems must run from central databases in-state.	Strict in-state backup requirements; DGE-approved internal control submissions.

Malta’s flexibility is deceptive — operators can position application engines almost anywhere in the EU/EEA, but must document the exact physical datacenter location, rack ID, internal and external IP addresses, and encryption protocol for replication traffic, and provide the MGA unhindered electronic access for ad-hoc virtual audits. Germany’s regime layers strict marketing controls on top of residency rules: promotional advertising for online poker or slots is banned between 6 AM and 9 PM across TV, radio, and internet. Operators satisfy age-verification requirements without violating GDPR by running facial age-estimation entirely on-device — the biometric payload never reaches a central server, which preserves compliance while still producing an auditable verification record.

Brazil’s framework, built under Law 14,790/2023, is arguably the most demanding technically: beyond the concession fee, operators need a local legal subsidiary with at least 20% local shareholder capital, and the entire platform must pass technical audits from accredited labs such as GLI, BMM North America, or eCOGRA within the first year of operation.

Hybrid edge topologies: reconciling residency with cloud scale

Strict localization rules in jurisdictions like New Jersey and Brazil historically forced operators onto legacy on-premises datacenters. Hyperscalers have since closed that gap with physical edge appliances — AWS Outposts, Azure Stack, and Google Distributed Cloud — deployed directly inside approved colocation facilities. The pattern lets operators run regulated database engines, RNGs, and transaction ledgers on hardware physically inside the state or country border, while offloading non-regulated workloads like CDN caching or analytics to the nearest public cloud region.

Hybrid Edge Architecture (AWS Outposts)

AWS Parent Region

| [Managed Control Plane via Service Link]

Local Colocation Facility (e.g., Continent 8)

|– AWS Outpost (42U rack, 5–15 kVA, redundant power)

| |– Local Gateway (routes local traffic directly)

| +– Dedicated EC2/EKS nodes (regulated database & RNG logic)

+– Network Edge as a Service (NEaaS)

|– Virtual firewalls & IDS/IPS

+– Cloud Connect MPLS (private link to parent region)

Deploying this in practice means coordinating three constraints simultaneously: environmental (42U racks need verified delivery paths and ASHRAE-compliant thermal handling), power (5–15 kVA redundant feeds, typically single-phase in North American facilities), and network (Outposts connect to a customer-owned local edge, which colocation providers like Continent 8 simplify via NEaaS-delivered virtualized routing and direct peering).

AWS documents five distinct hybrid patterns operators choose between, each suited to a different compliance posture:

Pattern	Best fit
Regional deployment	Regional boundary already aligns with local gambling approvals.
Local Zone & Wavelength Zone	Managed edge computing without owning physical hardware.
Local Zone & Outpost	Local database residency combined with public cloud edge processing.
Wavelength Zone & Outpost	High-frequency mobile betting apps within specific telco network zones.
Outposts for primary & secondary sites	Strict physical localization requiring redundant active-active deployments within the state.

To stop regulated data from leaking into a public-region bucket by mistake, operators pair these deployments with landing zone guardrails via AWS Control Tower and Organizations — non-bypassable policies that block APIs from copying rows out of Outpost storage into public-region resources. On the Google Cloud side, Google Distributed Cloud Hosted can be configured with NVIDIA H100 GPUs to run air-gapped instances of Gemma 2 on-premises, enabling conversational search, compliance reporting, and player-behavior monitoring without any PII or transaction data leaving the sovereign facility.

Security architecture: DDoS, fraud detection, and encryption

iGaming platforms are a persistent target for volumetric DDoS attacks, application exploits, and bot networks — and because major matches are time-sensitive, even brief downtime translates directly into lost betting volume. The layered defense typically includes:

Private exchange routing — networks like Continent 8’s Gaming Exchange route transaction and live-odds telemetry entirely off the public internet between operators, aggregators, and payment providers, neutralizing public-facing DDoS exposure while cutting latency.
CDN-terminated public traffic — TCP/TLS handshakes terminate close to the player at global CDN edges, backed by high-capacity scrubbing networks.
Game-specific edge firewalls — tools like OVHcloud’s Game DDoS protection support up to 100 custom L3/L4 rules per IP on bare-metal servers, filtering malformed traffic and UDP reflection floods before they reach application servers.
ML-driven threat and fraud analysis — Amazon GuardDuty continuously monitors VPC flow logs and API access for credential compromise and DNS anomalies, while models on SageMaker or Vertex AI score bet timing and transaction patterns for match-fixing, bonus abuse, and bot activity.
Self-exclusion synchronization — real-time cross-referencing against local databases like GamStop (UK) or OASIS (Germany) to restrict registered players.

On the cryptography side, TLS 1.3 secures data in transit, column-level encryption protects high-value fields like balances, and keys sit in Hardware Security Modules or cloud-native key management. A growing number of high-volume operators are also beginning to pilot post-quantum cryptographic libraries in transactional pipelines, ahead of the risk that today’s encrypted data could be decrypted retroactively once practical quantum attacks emerge.

Putting it together: the modular hybrid blueprint

No single deployment model satisfies both the performance ceiling and the residency floor an iGaming platform needs. A pure public-cloud approach breaks on data residency law; a pure on-premises approach loses the elasticity needed to absorb a World Cup traffic spike. The pattern that holds up in practice is a modular hybrid: latency-critical and strictly regulated engines — ledgers, RNGs, jackpot pools — isolated on bare-metal or localized private cloud inside the approved jurisdiction, while non-regulated services — containerized microservices, player acquisition, global CDN — run on public cloud platforms like Amazon EKS or Google Kubernetes Engine, connected back to the regulated core over private, low-jitter links.

In our work advising operators on this exact split, the recurring mistake isn’t underestimating the performance requirements — it’s underestimating how early the residency rules need to shape the architecture diagram. Retrofitting compliance onto an already-built public-cloud platform is dramatically more expensive than designing the data boundary in from day one.

Let’s work together!

See how we can help to overcome your challenges

FAQ

What latency does an iGaming platform actually need?

It depends on the product. Standard casino systems (slots and table games) generally operate well at around 100 ms response time. Real-time multiplayer games and fast-action sports betting require sub-30 ms round-trip latency to maintain fair play and prevent arbitrage exploits such as court-siding, where even a 500 ms delay can create a betting window on events that have already occurred. Modern 5G edge deployments can reduce latency below 10 ms, compared with the roughly 200 ms typical of legacy 4G networks.

Should iGaming platforms use public cloud or dedicated servers?

Most mature platforms use a hybrid approach. Latency-sensitive and highly regulated components—such as RNGs, ledgers, and jackpot pools—typically run on dedicated bare-metal servers to eliminate virtualization noise and meet regulatory residency requirements. Less critical workloads, including CDN delivery and player analytics, are often hosted in the public cloud for scalability and cost efficiency during traffic spikes. Public cloud egress fees can also make fully cloud-based live betting infrastructure expensive at scale.

How does data residency differ between Malta, Germany, Brazil, and New Jersey?

Requirements vary significantly. Malta allows primary infrastructure anywhere within the EU/EEA but requires real-time replication of regulated data to a Maltese data center. Germany requires servers and databases to remain within the EU/EEA. Brazil requires systems and user databases to be hosted in ISO 27001-certified data centers located in Brazil. New Jersey has the strictest rules, requiring primary gaming servers, equipment, and RNGs to be physically located in Atlantic City casino hotels or DGE-approved facilities.

Why do iGaming platforms use WebSockets instead of standard HTTP APIs?

Live odds, dealer streams, and real-time game state require continuous bidirectional communication. Traditional HTTP request-response cycles introduce unnecessary overhead through repeated requests and headers. WebSockets establish a persistent full-duplex connection that allows lightweight data frames to be exchanged with minimal latency after the initial handshake.

What database concurrency model is best for a betting platform?

Most betting platforms combine several concurrency models. Pessimistic locking is used for high-contention financial operations such as balance updates and jackpot pools. Optimistic concurrency control works well for low-contention tasks like profile updates. MVCC supports high-volume read operations such as player history and audit logs because readers do not block writers. Advanced platforms may also implement lock-free reservation mechanisms for better scalability during peak traffic.

Can an operator use AWS Outposts to satisfy strict data residency laws?

Yes. AWS Outposts and similar edge platforms allow regulated services such as databases, RNGs, and transaction ledgers to run on hardware physically located within an approved jurisdiction while integrating with public cloud services for less regulated workloads. Successful deployments require appropriate power, cooling, networking, and secure connectivity back to the parent cloud region.

How do operators protect real-time betting platforms from DDoS attacks?

Operators typically use layered protection. Private gaming networks keep critical traffic off the public internet, CDN edge services absorb large-scale attacks, edge firewalls block malicious traffic before it reaches application servers, and AI-driven monitoring systems detect fraud and traffic anomalies in real time.

How should an operator start planning multi-jurisdiction iGaming infrastructure?

Start with regulatory requirements rather than infrastructure technology. Data residency, audit, and localization rules should define the architecture from the outset because redesigning infrastructure later to achieve compliance is significantly more expensive than building it correctly from the beginning. Gart Solutions' cloud architects help operators map regulatory requirements to scalable hybrid cloud architectures before infrastructure investments begin.

Compliance

Compliance-by-design: why loot box regulation is starting to look like an MGA audit

Roman Burdiuzha

June 29, 2026

PEGI — the age-rating body used across more than 35 European countries — rolled out the biggest change to its classification framework in over a decade. Starting in June 2026, any game with paid random items gets a minimum PEGI 16 rating, regardless of its content otherwise. It's not a gambling law. It's an age-rating body quietly admitting that loot boxes need to be treated as a distinct risk category — which is one more data point in a pattern that's been building for years: regulators haven't agreed that loot boxes are gambling, but they increasingly want the same kind of proof a gambling regulator would demand. That's the actual story here, and it's worth being precise about it rather than overstating it. Loot boxes are not legally classified as gambling in the UK, most of the EU, or under US federal law, as of this writing. But "not legally gambling" and "not regulated" have stopped being the same thing — and the infrastructure needed to satisfy the second is converging, fast, on something that already exists in iGaming: an auditable, reproducible record of exactly how chance-based outcomes are generated and disclosed. TL;DR • The legal status is genuinely fragmented: Belgium bans paid loot boxes outright. The UK and most of the EU don't classify them as gambling. The US has no federal law at all — just FTC consumer-protection enforcement and a closely watched state lawsuit. • The pressure is real even without a gambling classification. The EU's Digital Services Act already restricts practices that drive "excessive or compulsive spending" by minors, independent of any future gambling law. PEGI's new rules just landed. The EU's Digital Fairness Act is expected to propose binding rules later this year. • The crux test everywhere is "money or money's worth." Items that can be cashed out on a secondary market blow up the usual exemption — which is exactly the legal theory behind New York's attorney general suing Valve over Counter-Strike 2 skins. • The practical answer looks like an RNG audit, not a legal opinion. Drop-rate logging, deterministic replay, and age-gating records — the same evidence an MGA or UKGC auditor expects from a casino game — are becoming the default expectation for loot boxes too, classification debate aside. $15B+ estimated annual loot box revenue PEGI 16 new minimum rating floor, from June 2026 Q4 2026 EU Digital Fairness Act proposal expected This article summarizes the regulatory landscape as we understand it in June 2026. It is not legal advice — the law here is moving quickly and varies by jurisdiction and product mechanic, so any compliance decision should be checked against current counsel for the specific markets you operate in. The patchwork, as of June 2026 The UK still does not treat loot boxes as gambling. The Gambling Act 2005 requires a prize to be "money or money's worth," and the UK Gambling Commission's long-standing position is that in-game items don't meet that bar because the publisher itself doesn't let you cash them out. The government reaffirmed this position again in January 2026, while noting it is "keeping possible future legislative options under review" — language it has now used for several years running. In its place, the industry runs a self-regulatory code (UKIE's principles, published in 2023) covering disclosure and age-gating, with the government able to step in if that proves insufficient. The EU has no single law treating loot boxes as gambling either — gambling regulation stays with individual member states, which is why approaches differ so sharply across the bloc. Belgium banned paid loot boxes outright back in 2018, treating them as illegal gambling under its existing framework, and that ban remains in force. The Netherlands took a different and more complicated path: its gambling authority initially fined EA roughly €10 million over FIFA Ultimate Team packs, but that fine was later overturned after a court found the mechanic, integrated as it was into normal gameplay, didn't constitute a standalone gambling product — a reversal worth knowing about, since the original fine is still the version of this story most commonly repeated. Poland has drafted amendments that would require a gambling licence for chance-based purchase mechanics, and the European Parliament's internal market committee voted in October 2025 to push for the EU's incoming Digital Fairness Act to ban loot-box-style mechanics in games accessible to minors — a proposal the European Commission is expected to table later in 2026, not a law that exists yet. Separately — and already in force, independent of any gambling classification — the EU's Digital Services Act restricts platforms accessible to minors from using practices that can drive excessive or compulsive spending, and the European Parliament has explicitly read that obligation as covering paid loot boxes with randomized content. This matters because it means EU compliance pressure on monetization design didn't wait for a gambling law; it's already live through a different legal door. The United States has no federal loot box law of any kind. Enforcement instead comes through the FTC, using ordinary consumer-protection and children's-privacy law (COPPA) rather than gambling statutes — a settled case already established that platforms must block under-16 purchases without verified parental consent. State bills in New York, Hawaii, Washington, and Indiana have proposed loot-box-specific rules; none has passed as of this writing. The case to actually watch is New York's attorney general suing Valve, arguing that Counter-Strike 2's loot boxes constitute illegal gambling under state law — grounded directly in the fact that CS2 skins have a real, liquid secondary market, which is the exact crack in the "no cash value" argument that every other jurisdiction's exemption also depends on. The test that keeps breaking: "money or money's worth" Almost every jurisdiction's gambling exemption for loot boxes rests on the same idea: it's only gambling if the prize has real monetary value, and a publisher who doesn't let you cash out hasn't given you that. It's a clean legal test, until a secondary market exists where players trade those items for real money anyway — at which point the "the publisher doesn't cash you out" defense stops mattering, because someone else effectively does. This is precisely the architecture decision sitting at the center of the Valve lawsuit, and it's worth treating as exactly that — an architecture decision, not just a legal one. Whether a game's items are tradeable, how easily they convert to cash through third-party markets, and how directly the publisher facilitates or merely tolerates that trade are product and infrastructure choices made well before any court gets involved. A studio that enables frictionless secondary trading of randomized-drop items is choosing to operate closer to the line that separates "not gambling" from "functionally gambling" in multiple jurisdictions at once. ⚖️ Whether to support a secondary market for randomized items is a decision with real regulatory exposure attached — and it's worth mapping before it's built, not after a regulator asks about it. Gart Solutions' compliance audit service covers exactly this kind of architecture-level risk review. What "compliance-by-design" actually means here iGaming operators already live with this problem solved, because they had no choice — a real-money casino game without a defensible audit trail simply doesn't get licensed. Our <a href="https://gartsolutions.com/industries/igaming/">iGaming practice</a> is built around exactly this: deterministic replay so any past outcome can be reconstructed from stored seed and state, version-locked deployment so a tested build is provably the one that shipped, and continuous logging that can answer a regulator's question about drop rates or RTP without a scramble. Game studios shipping loot boxes have rarely had to build any of that, because until recently nobody outside the studio was asking. That's changing on three fronts at once: PEGI's new rating floor makes the mechanic itself a labeled risk category rather than an invisible design choice; the EU's DSA already creates spending-pattern obligations independent of gambling law; and the Valve case shows a state attorney general willing to use existing gambling statutes against a mechanic that was never designed with that scrutiny in mind. None of these require a new "loot boxes are gambling" law to bite — they bite under the laws and rating systems that already exist. The practical response looks less like a legal memo and more like an infrastructure project: a verifiable, append-only log of what a given pull's odds actually were and what it produced, age-verification records that hold up under a regulator's request rather than just a checkbox, and a documented decision — made deliberately, not by default — about whether and how items can move into a secondary market. That's the same category of evidence an MGA or UKGC audit already expects. The studios that build it before they're asked won't be rebuilding their monetization stack under a deadline; the ones that don't are betting on the current patchwork staying exactly as fragmented as it is today. 🔍 An RNG and drop-rate audit trail built for an iGaming regulator transfers almost directly to a loot-box compliance request. Gart Solutions' IT audit services cover the same deterministic-replay and logging architecture across both. The takeaway for both industries The honest summary is that nobody — not Brussels, not London, not Washington — has settled this question, and anyone telling you with confidence exactly what the rules will say in twelve months is guessing. What is settled is the direction: more disclosure, more age-gating, and more scrutiny of secondary markets, arriving through age-rating bodies, consumer-protection law, and state attorneys general even where a gambling classification never lands. Building the audit infrastructure now isn't a bet on any one outcome — it's the same infrastructure either way. Could your monetization stack answer a regulator's question today? Gart Solutions helps both iGaming operators and game studios build the audit trails, drop-rate logging, and compliance architecture that hold up under real scrutiny — before a regulator or a lawsuit asks first. Talk to our architects →

DevOps

Someone else’s bug, your downtime: why bookmakers and game studios share the same third-party risk

Fedir Kompaniiets

June 29, 2026

On the morning of July 19, 2024, three of Australia's largest betting operators — Tabcorp, Sportsbet, and Ladbrokes — went dark within minutes of each other. None of them had pushed a bad deploy. None of them had a security breach. The cause sat entirely outside their own codebases, inside a security vendor's routine update to software running on millions of machines they didn't write a line of code for. We've already written about this shape of problem once, in our breakdown of Final Fantasy XIV's 2021 login crisis — a case where the real constraint was a global chip shortage that Square Enix had no control over. This is the same category of failure, but faster, more sudden, and arguably more dangerous: a single vendor's mistake, pushed automatically, with zero warning and zero opportunity to test it first. TL;DR • CrowdStrike, July 2024: a flawed security update bricked 8.5 million Windows machines worldwide in under 90 minutes — including the systems behind Tabcorp, Sportsbet, and Ladbrokes simultaneously. • AWS, October 2025: an internal DNS race condition inside DynamoDB took down a wide swath of the internet for hours — including Fortnite and Roblox, alongside Disney+, Reddit, and a Premier League broadcast. • The fix being fast didn't make recovery fast. CrowdStrike reverted its bad update in 78 minutes — but every machine that already crashed needed a person, physically, to boot into Safe Mode and delete a file by hand. • The shared lesson: you can't patch your way out of a dependency you don't control. You can only decide, in advance, how much blast radius one vendor's bad day is allowed to have. 8.5M Windows devices crashed worldwide 78 min to revert the update — recovery still took days $5.4B+ estimated direct cost to Fortune 500 firms The bookmakers: CrowdStrike takes down three operators at once At 04:09 UTC on July 19, 2024, CrowdStrike pushed a routine configuration update — a "Channel File" — to every Windows machine running its Falcon security sensor. The update was meant to improve detection of a specific attack technique. Instead, it contained a mismatch: the update assumed a data structure with 21 fields, but the actual content shipped with only 20. That single discrepancy triggered an out-of-bounds memory read inside Falcon's kernel-level driver, and the driver crashed every Windows machine it was running on — immediately, and on every subsequent boot attempt, because the driver loaded early in the startup sequence. Roughly 8.5 million Windows devices crashed within the hour, by Microsoft's own count. Tabcorp and Sportsbet — together responsible for more than 70% of Australia's wagering market — went down alongside Ladbrokes. Betting stopped entirely, online and in retail outlets. Tote price finalization froze mid-calculation, which meant payouts on bets already placed couldn't be settled until the underlying systems came back. Both operators publicly attributed the outage to "a global external technical issue," which was accurate — neither had any path to fix it themselves. What makes this case distinct from a typical outage is what happened after CrowdStrike found the bug. The company reverted the faulty update at 05:27 UTC — 78 minutes after it shipped. In a normal software incident, that's the end of the story: bad deploy rolled back, service restored. Here, it wasn't. Every machine that had already crashed was stuck in a boot loop, because the damage was done locally on each device before the revert ever reached it. Recovery required someone to physically access each affected machine, boot into Safe Mode, locate a specific system file, and delete it by hand — one machine at a time, sometimes complicated further by BitLocker disk encryption requiring a separate recovery key. For organizations with thousands of endpoints, that's not a fix measured in minutes. It's a fix measured in however many hands you have available. The game: an AWS database failure takes down Fortnite and Roblox On October 20, 2025, a separate but structurally identical story played out in the gaming industry. Amazon's DynamoDB — a managed database service that much of the internet quietly depends on, often without realizing how deeply — suffered a DNS failure in its largest region, US-East-1. The proximate cause, per AWS's own postmortem, was a race condition: an internal system called a DNS Enactor that updates DynamoDB's DNS records ran unusually slowly for one execution, while a second, parallel Enactor processed updates far faster than normal. The mismatch between the two led to DynamoDB's DNS records effectively being emptied, and every system trying to reach DynamoDB through its public endpoint — including a large share of AWS's own internal services — began failing immediately. The outage rippled outward in a way that surprised even engineers who consider themselves dependency-aware. Disney+, Reddit, Snapchat, Coinbase, the McDonald's app, and UK government tax services all went down. So did Fortnite and Roblox, reported alongside the others as players found themselves unable to connect. Independent analysis of the incident noted a detail worth sitting with: services that monitor other services' uptime were themselves casualties — status pages built on Atlassian's Statuspage product couldn't be updated, meaning some companies couldn't even tell their own users what was happening, because the tool they'd use to say so depended on the same failing infrastructure. The outage lasted around three hours before AWS engineers manually intervened to restore DynamoDB's DNS. For a live-service game, three hours during a peak window isn't a minor blip — it's measured in lost engagement, refund requests, and the same kind of player trust erosion we covered when we looked at what happens when sportsbooks go down during the World Cup. The mechanism was completely different — a malicious attack versus an internal race condition inside a trusted vendor's infrastructure — but the experience on the other end of the connection looked the same: the platform isn't responding, and there's nothing the player-facing team can do about it directly. 🧭 Most teams can name their direct vendors. Far fewer can name their vendors' vendors. The AWS incident took down services that didn't think of themselves as AWS-dependent at all — they depended on something that depended on DynamoDB. Gart Solutions' infrastructure audit service is built around mapping that second and third layer of dependency before it becomes a 3 a.m. discovery. Why "it wasn't our bug" doesn't help you at 3 a.m. Both incidents share a structure that's worth naming directly, because it's the part most incident-response planning misses. It isn't just "depend less on third parties" — for any real-time platform, some third-party dependency is unavoidable. The actual lesson is narrower and more actionable: A vendor's fast fix doesn't guarantee a fast recovery for you. CrowdStrike reverted its bad update in under 90 minutes. That timeline meant almost nothing to organizations whose machines had already crashed, because the recovery step required physical, manual intervention that no amount of vendor speed could shortcut. Your dependency map is deeper than your vendor list. Plenty of companies hit by the AWS DynamoDB failure didn't think of themselves as exposed to it — they depended on a tool that depended on AWS, two or three layers removed from a decision anyone on their team actually made. The blast radius is a design choice, even when the bug isn't yours. Whether a single vendor's failure takes down your entire platform or just a degraded subset of features is determined by how much of your stack assumes that vendor will always be there — not by how good the vendor's engineering team is. "Not our bug" doesn't buy you patience from players or regulators. Tabcorp and Sportsbet were transparent about the external cause, and it didn't make the outage shorter or the customer frustration smaller. The same will be true for a game studio explaining an AWS-shaped outage to a community mid-launch. 🛟 The honest goal isn't eliminating third-party risk — it's bounding it before it's a live incident. Failover paths, degraded-mode design, and a tested incident response plan for "the outage isn't ours but the downtime is" are core to Gart Solutions' SRE practice. The takeaway for both industries A sportsbook can't audit CrowdStrike's source code, and a game studio can't audit AWS's internal DNS systems. That's not the point. The point is that both incidents were entirely predictable in shape, if not in timing: any platform with a deep enough dependency on a single vendor will eventually inherit that vendor's worst day, and the only real choices left at that point are how much of your platform that worst day is allowed to take down with it, and how fast a human can actually act once it does. That's an architecture and incident-response question, not a vendor-selection one — switching vendors just relocates the same risk. The work is in mapping where a single point of failure actually sits in your stack, deciding what degrades gracefully versus what goes dark entirely, and rehearsing the manual recovery steps before you need them at 3 a.m. with thousands of angry players or bettors watching a status page that, ironically, might also be down. Do you know what happens when your biggest vendor has its worst day? Gart Solutions maps the dependency chains most teams don't see until they fail, and builds the failover and incident-response plans that bound the damage when they do. Talk to our architects →

DevOps

What World Cup sportsbook attacks and game-launch outages have in common

Fedir Kompaniiets

June 29, 2026

Right now, while the 2026 FIFA World Cup's expanded 48-team tournament plays out across the US, Mexico, and Canada, sports-betting platforms are taking some of the heaviest DDoS pressure they'll see all year. Security researchers tracking the tournament have documented attack traffic against betting platforms climbing steadily through late May, then sharply from June 5 onward as kickoff approached — and on the day before the opening match, a single traffic spike that dwarfed everything before it: over a million requests in one burst, more than three times the previous peak. That's not a coincidence, and it's not really a new story either. A few weeks ago we published a breakdown of three real, public postmortems from game launches — Fortnite, Final Fantasy XIV, and Helldivers 2 — that all broke under sudden, extreme load. None of those were attacks. They were legitimate demand. But the shape of the failure, and increasingly the shape of the defense required, looks the same whether the traffic wants to hurt you or just wants to play. TL;DR • The pattern is identical at the infrastructure layer: a near-vertical request curve with no ramp-up, arriving faster than a human can classify it as malicious or legitimate. • World Cup sportsbooks (2026): real tracked attacks have hit roughly 18,000 requests per second with zero warm-up, deliberately routed through dozens of countries to defeat geo-blocking. • Game launches (Fortnite, 2018): the same near-vertical curve, except every request was a real paying player — and it still exhausted AWS instance limits and IP pools just as fast. • The shared lesson: if your defense depends on a human deciding "is this an attack or just success," you've already lost the seconds that matter. 18,000 requests/sec, zero warm-up 87 sec window before a cascade spreads 70–75% forecast rise in World Cup betting volume The attack: what's actually hitting sportsbooks this World Cup Threat researchers monitoring sports-betting platforms during the 2026 World Cup have published a detailed breakdown of the pattern: traffic against one tracked platform spiked to roughly 18,000 requests per second in what's described as a near-vertical wall — no ramp-up, no warm-up period, no gradual escalation. Within seconds of the initial surge, the geographic composition broadens rapidly: an initial spike from Russia-origin traffic is quickly joined by US, German, Indonesian, Singaporean, and a dozen other country sources, each adding hundreds to low thousands of requests per second. That spread isn't random. Spreading the source footprint across many countries within seconds makes any single-country block largely useless, and researchers note the traffic draws entirely on proxy infrastructure and data centers with an established history of malicious activity — a pre-assembled operation, not opportunistic reuse. None of it reflects a real betting platform's actual user base; a European-regulated sportsbook simply doesn't get organic traffic from a dozen unrelated countries within the same few seconds. The operational detail that matters most for defenders: researchers estimate roughly 87 seconds between the first signal and the point where the attack cascades broadly enough that manual, human-in-the-loop response is no longer fast enough. Automated, real-time blocking at millisecond latency isn't a nice-to-have here — it's the only posture that has a chance. And the stakes are specifically tied to the product itself. In-play betting — placing wagers while a match is live — is one of the highest-margin features sportsbooks offer, and it's consistently the first thing to break under load. Industry reporting suggests roughly a third of bets during a major tournament final are placed in-play, and the tolerance for delay is brutal: the difference between a two-second and a five-second response during a key moment isn't a minor glitch, it's a missed bet, a frozen cash-out, and a player who doesn't give the platform a second chance. The launch: what hit Fortnite at 3.4 million concurrent players We covered this in detail in our breakdown of three real game-launch postmortems, but it's worth pulling the relevant thread here specifically: when Fortnite hit a then-unprecedented 3.4 million concurrent players in February 2018, part of what broke was strictly a capacity ceiling that had nothing to do with game logic. Epic's own postmortem describes hitting AWS's regional instance limits running on fleets of c4.8xlarge instances, and running out of IP addresses in their standard subnets purely from the pace of scaling — a near-vertical demand curve that exhausted infrastructure quotas in roughly the same shape a coordinated attack would. The traffic wasn't malicious. Every one of those requests was a real player wanting to play a game they'd already downloaded. But from the perspective of the infrastructure underneath — the load balancers, the connection pools, the cloud provider's regional quotas — a sudden, extreme, geographically broad surge in connections looks remarkably similar whether it's organic enthusiasm or a botnet. The failure mode wasn't "we got attacked." It was "we got more legitimate demand than our quotas and pooling assumptions could absorb fast enough," which is functionally the same shape of problem a DDoS defense exists to handle. 🛡️ This is exactly why DDoS-readiness and launch-readiness end up being the same engineering exercise. Whether the surge is malicious or just successful, the fix is the same: automated, real-time response that doesn't wait on a human classification step. Gart Solutions' security audit service is built around stress-testing exactly this distinction before it's tested for you, live. Why the same infrastructure has to defend against both The uncomfortable truth for anyone running a real-time platform — a sportsbook during in-play betting, a game server during a launch spike — is that in the first several seconds, a malicious DDoS surge and a legitimate viral demand spike can look identical at the network layer. Same near-vertical request curve. Same overwhelmed connection pool. Same sudden geographic and behavioral pattern that doesn't match yesterday's baseline. That's not a reason to give up on telling them apart — it's the reason the first line of defense can't depend on telling them apart at all. The systems that survive both scenarios share the same design properties regardless of which one they're facing: Elastic capacity that triggers on pattern, not on classification. Autoscaling and rate-limiting need to respond to "this looks anomalous" within seconds, not wait for a security team or a war room to confirm intent. Geo- and behavior-aware edge mitigation, because both attackers and viral demand show up as traffic shapes that don't match an operator's real, known user base — and that signal is available before anyone's looked at a single request payload. Quota and connection-pool headroom built for the spike, not the average, because cloud provider regional limits and IP exhaustion don't care whether the requests hitting them are well-intentioned. A fallback that degrades gracefully rather than falling over completely — queuing, graceful rate-limiting, or a holding page beats a total outage whether the cause is 2 million real fans or 20,000 requests a second from a botnet. Sportsbooks during a World Cup and game studios during a launch are solving variations of the exact same problem, and most of them are doing it with teams and tooling that were built for one or the other, not both. 📡 The defensive posture that holds up under a real attack is the same one that holds up under real success. Real-time anomaly detection, automated mitigation, and capacity that doesn't wait for a human in the loop are the core of Gart Solutions' SRE practice — built for platforms where the difference between a good night and a very bad one is measured in seconds. The takeaway for both industries If you operate a sportsbook, the next major tournament — or even the next big goal in this one — is a live test of whether your platform can tell a coordinated attack from a crowd of real bettors fast enough to matter, without making either group wait. If you run a live-service game, your next content drop or marketing push is the same test wearing a different shirt. Neither industry should be solving this from scratch. The shape of the problem — sudden, extreme, geographically anomalous traffic that has to be absorbed or mitigated in seconds, not minutes — has been documented publicly, repeatedly, by both sides. The infrastructure that handles it well doesn't ask "is this an attack," it asks "can we absorb or shed this safely either way," and answers that question automatically before a person ever gets paged. Is your platform ready for its next traffic spike — attack or success? Gart Solutions runs security and infrastructure audits built around exactly this distinction: real-time, automated readiness for sudden load, whether it's malicious or just means you're winning.

TL;DR

Why standard cloud architectures fall short for iGaming

The hardware baseline

Stateful sessions: scaling WebSockets without losing game state

Database architecture: concurrency control for the ledger

Multi-jurisdictional compliance: the real constraint on architecture

Hybrid edge topologies: reconciling residency with cloud scale

Security architecture: DDoS, fraud detection, and encryption

Putting it together: the modular hybrid blueprint

FAQ

What latency does an iGaming platform actually need?

Should iGaming platforms use public cloud or dedicated servers?

How does data residency differ between Malta, Germany, Brazil, and New Jersey?

Why do iGaming platforms use WebSockets instead of standard HTTP APIs?

What database concurrency model is best for a betting platform?

Can an operator use AWS Outposts to satisfy strict data residency laws?

How do operators protect real-time betting platforms from DDoS attacks?

How should an operator start planning multi-jurisdiction iGaming infrastructure?

You might also like

Compliance-by-design: why loot box regulation is starting to look like an MGA audit

Someone else’s bug, your downtime: why bookmakers and game studios share the same third-party risk

What World Cup sportsbook attacks and game-launch outages have in common

Subscribe to our blog