How our tools work, end-to-end

Methodology

Last updated: April 24, 2026 — maintained by the Krawly Editorial Team.

This page exists for two reasons. First, transparency — readers, advertisers, and search engines deserve to know how Krawly arrives at the answers it shows. Second, accountability — when a result looks wrong, we want it to be diagnosable rather than mysterious.

1. The request lifecycle

When you submit a URL or input to a Krawly tool, the request hits our Django API on krawly.io/api/v1/tools/<tool>/. The API rate-limits anonymous traffic to 30 calls per day per IP, records a minimal log line (timestamp + tool slug + input fingerprint + result code), and dispatches the job to the right service module. For tools that fetch external content, we then perform an outbound request — typically with a 15-second socket timeout, a recognisable User-Agent: Krawly/1.0 header, and TLS verification turned on.

2. Crawl etiquette

We respect robots.txt for tools whose entire job is to crawl a site (broken-link checker, sitemap-health checker, internal-link mapper). For one-shot lookups against a single user-supplied URL — the equivalent of pasting that URL into a browser — we follow the same etiquette a regular browser would: one request, normal headers, no aggressive parallelism. We do not provide any tool whose purpose is to bypass authentication, scrape behind a paywall, evade rate limits, or impersonate another crawler.

3. Data sources

Different tools draw from different sources. The honest list:

  • Public HTML and HTTP responses for SEO, content, screenshot, and scraping tools (we fetch the same bytes a browser would).
  • DNS resolvers (system resolver plus public DNS providers) for DNS, MX, DMARC, and propagation tools.
  • Public WHOIS for domain age, availability, and ownership tools.
  • Certificate Transparency logs (crt.sh and Subfinder for subdomain discovery, no auth required).
  • Google autocomplete & YouTube oEmbed for keyword and YouTube research tools (rate-limited public endpoints).
  • Internet Archive Wayback Machine API for the Wayback Checker.
  • Have I Been Pwned k-anonymity API for the email breach checker (we never send full passwords; we hash and send the first five characters of the SHA-1 prefix).
  • yt-dlp for media inspection on social-media tools — we extract metadata only and do not store downloaded content.
  • Open Food Facts & UPCitemdb for the barcode lookup.
  • No paid third-party APIs that require attribution are silently used. If a tool is built on top of a paid API, we say so on its page.

4. Accuracy and confidence levels

Every Krawly result has an honest confidence ceiling determined by what the source data lets us promise:

  • High: Deterministic results from authoritative sources — DNS records, HTTP headers, JSON-LD blocks, exact-match regex extractions.
  • Medium: Heuristic results that depend on signature matching — CMS detection, tech-stack fingerprinting, JS-framework detection. We tell you which signals matched and you can decide whether to trust the inference.
  • Low: Inference-based results that need human judgement — username availability across platforms (HEAD-request inference), social-profile guessing, and similar OSINT lookups.

5. Rate limits and pricing

Anonymous users get 30 tool runs per IP per day. The free quota is the version most readers will ever need. Paid plans and the developer SDK exist for teams running automated workflows; they fund the residential proxies, headless browser farms, and AI-feature GPUs that keep the free tier free for everyone else. See the About page for the funding model in full.

6. Where we cache and where we do not

Tool inputs are not cached across users — your keyword research is not used to seed someone else's keyword suggestions, and your SEO audit URL is not added to a public list. Some upstream metadata (DNS records, Subfinder enumerations) is cached briefly per-IP to reduce repeated outbound load on third-party services; this cache is keyed by the request itself and never shared cross-user.

7. Failure modes we expose

When a tool cannot deliver a clean answer, we return an explicit error rather than fabricated output. The most common cases are: the target site is offline (we say so and surface the HTTP status); the target site rate-limited us (we say so); the third-party API we depend on returned an error (we say which one); the input was malformed (we say what we expected). If you see a result that looks wrong, you can usually diagnose it from the error string — and if you cannot, please write to info@krawly.io with the tool name and the input you tried.

8. Privacy of inputs

We log the tool slug and a fingerprint of the input (truncated to 50 characters in most cases) for abuse detection and quota enforcement. We do not log full input bodies, generated passwords, or scraped email lists in our long-term database. Anonymous users are identified by the rolling 24-hour rate-limit window of their IP and nothing else. Detailed handling is in our Privacy Policy.

9. Change log

When we change a methodology in a way that affects what users see — for example switching DNS resolvers, adding a new data source, changing a confidence threshold — we update this page and bump the "last updated" date at the top. We do not silently revise the rules.

10. Audit and questions

If you would like to verify any specific claim above, want to suggest a correction, or have a methodology concern that affects how we should report results, please write to info@krawly.io. Detailed answers, including source links, are part of the job.