This page documents how we collect, anonymize, and analyze the AI tarot reading data we publish on this site. We update it whenever the methodology changes.
Last updated: 2026-05-06.
Sample composition
Our current dataset:
- 1,370 readings total
- ~750 unique participants — composed of:
- 69 registered users (defined by user_id; deduplication strict; 24% of readings)
- ~680 anonymous guest sessions (by IP fingerprint; 76% of readings)
- 7 languages (EN 90.7%, PL 3.6%, PT 2.9%, FR 1.2%, ES 0.9%, DE 0.4%, IT 0.2%)
- Time window: 2026-01-01 to 2026-05-02
- 1,261 readings with question text (the rest are "draw without question" requests)
Important caveat: guest IP fingerprints overcount unique participants (multiple users can share an IP — household, university, corporate NAT) and undercount returning users (one person across mobile + home + work IPs counts as 3). Treat ~750 as a rough order-of-magnitude estimate, not a precise number. The 69 registered figure is exact.
The dataset grows continuously. Quarterly snapshots get published with full statistics. Real-time stats may differ from the published snapshot by up to one quarter.
What we collect
For each reading, our application logs:
| Field | Type | Purpose |
|---|---|---|
| Reading ID | UUID | Unique identifier |
| User ID hash | SHA-256 | Anonymized user grouping |
| Spread type | enum | Which spread (3-card, Celtic, etc.) |
| Cards drawn | array of card IDs | Order matters (positions) |
| Reversed flags | array of bool | Per card |
| Question text | text (optional) | If user provided |
| Question category | enum | Auto-categorized: future, love, work, money, health, family, uncategorized |
| Language | ISO 639-1 | UI language at time of reading |
| Timestamp | UTC | Date + time |
| AI model | enum | gpt-5.4 / claude-sonnet-4.6 / gemini-2.5-flash / nvidia-llama-3.3 |
| User rating | 1-5 (optional) | Post-reading feedback if given |
What we don't log: IP address (only SHA-256 hash for security), email, name, physical location beyond country code from IP geolocation, browser fingerprints, or any other personally identifying data.
Anonymization process
User IDs in published statistics are SHA-256 hashes with a per-snapshot salt. Hash collisions are practically zero (2^256 hash space, 69 users).
For published per-card statistics, we apply k-anonymity with k=5:
- Combinations of (language + spread_type + week) with fewer than 5 observations are aggregated to higher-level groupings before publication
- Individual reading IDs never appear in public datasets
- Question text is published only in aggregated category counts, never verbatim
The full anonymization audit is performed before each quarterly publication. Audit notes are included in the dataset download.
AI provider attribution
Readings are generated using one of five LLM providers depending on user tier and queue status:
- NVIDIA Llama 3.3 70B — free tier fallback (last resort)
- OpenRouter Gemini 2.5 Flash — primary free tier (≥90% of free readings)
- OpenRouter Qwen3-235B — secondary free tier
- OpenRouter GPT-5.4 — paid Tier 1 ("Seeker") readings
- Anthropic Claude Sonnet 4.6 — paid Tier 2 ("Mystic") dual-oracle readings
Per-reading AI provider attribution is included in the dataset for researchers wanting to compare AI behavior across providers.
Statistical limitations
Three limitations matter:
Sample size. 1,370 readings is enough to detect strong effects (a 50%+ deviation from random, for instance) but not enough for fine-grained per-card significance testing. To claim a specific card appears more often than chance, we'd need approximately 6,000 readings per the standard chi-square sample size calculation for a 78-category distribution. We're roughly halfway there.
Selection bias. Our users are not a representative sample of all tarot users globally. They are people who:
- Found aimag.me through search, social, or referral
- Speak one of our supported languages
- Were comfortable using a web-based AI tarot tool
- Self-selected into our funnel
Generalization to "all tarot users" is not warranted from this dataset.
Observational, not experimental. We don't randomize, we don't have a control group, we can't establish causation. We can describe patterns. We can't claim to explain them.
Update cadence
- Quarterly snapshots: January, April, July, October. Published as a versioned dataset with anonymization audit notes.
- Real-time aggregate stats: updated daily on this site (live counters, top cards, day-of-week distribution).
- Per-reading data: never published in real-time. Always batched into quarterly anonymized snapshots.
Conflict of interest
The author of this research operates aimag.me, the AI tarot tool from which this data is collected. This is disclosed on every page. We have a financial interest in users finding tarot useful enough to subscribe to paid tiers.
To minimize bias from this conflict:
- We publish data even when it's unflattering to AI tarot (e.g., the Major:Minor randomness finding directly undermines mystical claims)
- We commit to publishing all quarterly snapshots regardless of what they show
- We document and explain methodology changes whenever they happen
- The dataset itself is open under Creative Commons license — anyone can run their own analysis and disagree with our interpretations
License
Published statistics on this site are released under Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).
Citation format:
aimag.me Tarot Reading Dataset (n=1,370). Collected 2026-01-01 to 2026-05-02. Anonymized open dataset. Available at aimag.me/research.
Questions
For methodology questions, dataset access requests, or replication queries: [email protected].
For RODO/GDPR-related data subject requests, see our Privacy Policy.