Pingara: The Affordable Monitoring Solution

When an incident occurs, the first question is always "why?" Pingara's AI-powered root cause analysis uses Google's Gemini 2.5 Flash model to analyze performance metrics and suggest the most likely cause of the failure.

How It Works

Data Collection

Every HTTP check captures a detailed performance breakdown:

Metric	What It Measures
DNS Lookup Time	Time to resolve hostname to IP address
TCP Connect Time	Time to establish a TCP connection
TLS Handshake Time	Time to negotiate SSL/TLS (HTTPS only)
Time to First Byte (TTFB)	Time until the server starts sending data
Total Duration	Full request/response cycle time
Response Size	Size of the response body
Status Code	HTTP response code
Error Type	Category of failure (if any)

AI Analysis

When an incident is created, Pingara feeds these metrics into the Gemini 2.5 Flash model via Google's Genkit framework. The AI analyzes patterns and returns actionable root cause suggestions.

Input to the AI:

{
  "dnsLookupTime": 2500,
  "tcpConnectTime": 50,
  "tlsHandshakeTime": 120,
  "httpStatusCode": 0,
  "responseTime": 30000,
  "errorMessage": "DNS lookup failed: NXDOMAIN"
}

AI output:

Root cause hints:
1. DNS resolution failure — The domain could not be resolved.
   Possible causes: expired domain, misconfigured DNS records,
   DNS provider outage.
2. Check your domain registrar to confirm the domain hasn't expired.
3. Verify DNS records are correctly configured with your provider.

Interpreting Performance Metrics

Understanding what each metric tells you is key to effective troubleshooting.

DNS Lookup Time

Normal: 10–50ms Elevated: 200ms+ Failed: Returned error

Symptom	Likely Cause
Slow (200ms+)	DNS server overloaded or geographically distant
Very slow (1000ms+)	DNS provider experiencing issues
Failed (NXDOMAIN)	Domain doesn't exist or DNS misconfigured
Failed (SERVFAIL)	DNS server error
Failed (timeout)	DNS server unreachable

Action: Check your DNS provider's status page. Consider using a faster DNS provider or adding redundant DNS servers.

TCP Connect Time

Normal: 10–100ms (depends on geographic distance) Elevated: 500ms+

Symptom	Likely Cause
Slow	Network congestion between probe and server
Connection refused	Server is up but not accepting connections on that port
Timeout	Server unreachable, firewall blocking, or host down

Action: Check server firewall rules, verify the service is listening on the expected port, and check for network issues between the probe region and your server.

TLS Handshake Time

Normal: 50–200ms Elevated: 500ms+

Symptom	Likely Cause
Slow	Server CPU bottleneck during key exchange
Failed (expired)	SSL certificate has expired
Failed (invalid)	Certificate doesn't match hostname
Failed (self-signed)	Certificate not trusted

Action: Check certificate validity, ensure your server supports modern TLS protocols, and consider enabling TLS session resumption.

Time to First Byte (TTFB)

Normal: 100–500ms Elevated: 1000ms+

TTFB is the most telling metric for backend performance because it includes:

Server processing time
Database query execution
External API calls
Cache lookups

Symptom	Likely Cause
Consistently slow	Backend performance issue (slow queries, missing cache)
Intermittently slow	Resource contention, garbage collection pauses
Very slow (5000ms+)	Server overloaded, deadlock, or infinite loop

Action: Profile your backend application. Check database query performance, caching effectiveness, and server resource utilization.

Total Duration

The sum of all phases. If total duration exceeds your configured timeout, the check fails with a timeout error.

Common Root Cause Patterns

Pattern 1: DNS Failure

DNS: Failed | TCP: N/A | TLS: N/A | TTFB: N/A
Error: DNS lookup failed

Likely cause: Domain expired, DNS records deleted, or DNS provider outage.

Pattern 2: Server Unreachable

DNS: 25ms | TCP: Timeout | TLS: N/A | TTFB: N/A
Error: Connection timeout

Likely cause: Server is down, firewall blocking traffic, or network outage.

Pattern 3: SSL Certificate Issue

DNS: 30ms | TCP: 45ms | TLS: Failed | TTFB: N/A
Error: SSL certificate expired

Likely cause: Certificate expired and wasn't renewed. Check your certificate management process.

Pattern 4: Application Error

DNS: 20ms | TCP: 35ms | TLS: 150ms | TTFB: 80ms
Status: 503 Service Unavailable

Likely cause: Application crashed, deployment in progress, or backend dependency failure.

Pattern 5: Performance Degradation

DNS: 25ms | TCP: 40ms | TLS: 130ms | TTFB: 4500ms
Status: 200 (but very slow)

Likely cause: Database bottleneck, cache miss storm, or resource exhaustion.

Viewing Root Cause Hints

From the Incident Detail Page

Navigate to Incidents → [Your Incident]
Look for the Root Cause Analysis section
Review the AI-generated hints

From the Root Cause Dialog

Click the Analyze button on any incident to trigger a fresh AI analysis with the latest available data.

Limitations

AI analysis is a suggestion, not a definitive diagnosis
Works best with clear failure patterns (DNS failure, timeout, 5xx errors)
May be less specific for intermittent or complex multi-service failures
Requires sufficient check result data for meaningful analysis

Best Practices

Combine AI Hints with Manual Investigation

Use root cause hints as a starting point, then verify with your own monitoring tools, server logs, and infrastructure dashboards.

Track Patterns Over Time

If the same root cause appears repeatedly:

DNS failures → Consider switching DNS providers
TLS issues → Automate certificate renewal (e.g., Let's Encrypt)
TTFB spikes → Invest in backend performance optimization

Root cause hints are included in incident reports. Share these with your engineering team during post-mortem reviews to drive infrastructure improvements.

Next Steps

Understanding Incidents — Incident lifecycle and detection
HTTP/HTTPS Monitoring — Performance metrics in detail
Apdex Scoring — Performance thresholds and scoring

Root Cause Analysis