How we mitigated a DDoS attack on our website

Last week, one of our websites fell victim to a DDos (Distributed Denial of Service) attack. This post details our journey through detection, analysis, and mitigation of the attack.

Detection and Initial Response

On July 12, 2024, at approximately 02:43 UTC, our 24/7 support team raised a high-priority alert: our primary web application was unresponsive. Immediate investigation revealed that our Railway-hosted container had failed due to a surge in incoming requests. The sheer volume of traffic had overwhelmed our allocated resources, pushing CPU and memory utilization beyond sustainable limits.

These signs pointed to a DDoS attack, prompting our immediate response.

Mitigation Strategy and Execution

Phase 1: Emergency Traffic Filtering

Our first line of defense leveraged Cloudflare's robust security infrastructure:

We immediately activated Cloudflare's "Under Attack" mode for the affected domain. This implements advanced traffic scrutiny, effectively filtering out malicious requests that fail to meet legitimacy criteria.
Cloudflare analytics revealed an intital influx of over 30 million requests per hour - orders of magnitude above our typical traffic patterns.

Phase 2: Infrastructure Resilience

With the initial traffic filtering in place, we pivoted to fortify our application hosting:

We made the strategic decision to migrate our hosting from Railway to Vercel. This move was driven by two key factors:
- Vercel's recent launch of Web Application Firewall (WAF) and DDoS mitigation services, offering an additional layer of defense.
- Vercel's serverless architecture, which provides scalability and resilience against resource exhaustion attacks.
The migration was executed quickly, bringing our application back online within minutes.

Phase 3: Advanced Threat Mitigation

With basic service restored, we focused on neutralizing the ongoing attack:

Traffic analysis revealed a highly distributed attack vector, using multiple IP addresses and different User Agent strings, complicating simple IP-based blocking.
We paired Cloudflare's "Under Attack" mode with custom Web Application Firewall (WAF) rules, implementing adaptive security thresholds.
Recognizing that the attackers solely targeted our homepage, we deployed Cloudflare's "Managed Challenge" feature. This mechanism presents legitimate users with a turnstile challenge, effectively thwarting automated attack tools.
Post-implementation, Vercel analytics confirmed an immediate drop in malicious traffic, signaling the attack's neutralization.

Navigating Unintended Consequences

Our aggressive mitigation strategy inadvertently introduced some collateral issues:

Certain API endpoints crucial for dashboard functionality were initially blocked by our Cloudflare WAF rules.
Image rendering, provided by Next.js's component, experienced interruptions.

We addressed these issues by implementing a granular whitelist strategy:

All traffic to /api/ endpoints was explicitly allowed, preserving application functionality.
Requests to /images/ were whitelisted to restore image rendering.

These adjustments restored full functionality for legitimate users while maintaining our hardened security rules against the attack.

Attack Analysis and Impact

The scale of this DDoS attack was a first for our organisation:

Over a span of mere hours, our infrastructure faced more than 300 million requests.
The distributed nature of the attack, leveraging a botnet spanning multiple geographic regions, demonstrated a high level of sophistication
The attack's primary objective was to disrupt our web presence, with no evidence of data exfiltration or ransom demands.

By continuously evolving our defenses and leveraging cutting-edge security technologies, we're better equipped to face the ever-changing landscape of cyber threats, ensuring the availability of our web services.