This page documents how we collect, anonymize, and analyze the AI tarot reading data we publish on this site. We update it whenever the methodology changes.

Last updated: 2026-05-06.

Sample composition

Our current dataset:

1,370 readings total
~750 unique participants — composed of:
- 69 registered users (defined by user_id; deduplication strict; 24% of readings)
- ~680 anonymous guest sessions (by IP fingerprint; 76% of readings)
7 languages (EN 90.7%, PL 3.6%, PT 2.9%, FR 1.2%, ES 0.9%, DE 0.4%, IT 0.2%)
Time window: 2026-01-01 to 2026-05-02
1,261 readings with question text (the rest are "draw without question" requests)

Important caveat: guest IP fingerprints overcount unique participants (multiple users can share an IP — household, university, corporate NAT) and undercount returning users (one person across mobile + home + work IPs counts as 3). Treat ~750 as a rough order-of-magnitude estimate, not a precise number. The 69 registered figure is exact.

The dataset grows continuously. Quarterly snapshots get published with full statistics. Real-time stats may differ from the published snapshot by up to one quarter.

What we collect

For each reading, our application logs:

Field	Type	Purpose
Reading ID	UUID	Unique identifier
User ID hash	SHA-256	Anonymized user grouping
Spread type	enum	Which spread (3-card, Celtic, etc.)
Cards drawn	array of card IDs	Order matters (positions)
Reversed flags	array of bool	Per card
Question text	text (optional)	If user provided
Question category	enum	Auto-categorized: future, love, work, money, health, family, uncategorized
Language	ISO 639-1	UI language at time of reading
Timestamp	UTC	Date + time
AI model	enum	gpt-5.4 / claude-sonnet-4.6 / gemini-2.5-flash / nvidia-llama-3.3
User rating	1-5 (optional)	Post-reading feedback if given

What we don't log: IP address (only SHA-256 hash for security), email, name, physical location beyond country code from IP geolocation, browser fingerprints, or any other personally identifying data.

Anonymization process

User IDs in published statistics are SHA-256 hashes with a per-snapshot salt. Hash collisions are practically zero (2^256 hash space, 69 users).

For published per-card statistics, we apply k-anonymity with k=5:

Combinations of (language + spread_type + week) with fewer than 5 observations are aggregated to higher-level groupings before publication
Individual reading IDs never appear in public datasets
Question text is published only in aggregated category counts, never verbatim

The full anonymization audit is performed before each quarterly publication. Audit notes are included in the dataset download.

AI provider attribution

Readings are generated using one of five LLM providers depending on user tier and queue status:

NVIDIA Llama 3.3 70B — free tier fallback (last resort)
OpenRouter Gemini 2.5 Flash — primary free tier (≥90% of free readings)
OpenRouter Qwen3-235B — secondary free tier
OpenRouter GPT-5.4 — paid Tier 1 ("Seeker") readings
Anthropic Claude Sonnet 4.6 — paid Tier 2 ("Mystic") dual-oracle readings

Per-reading AI provider attribution is included in the dataset for researchers wanting to compare AI behavior across providers.

Statistical limitations

Three limitations matter:

Sample size. 1,370 readings is enough to detect strong effects (a 50%+ deviation from random, for instance) but not enough for fine-grained per-card significance testing. To claim a specific card appears more often than chance, we'd need approximately 6,000 readings per the standard chi-square sample size calculation for a 78-category distribution. We're roughly halfway there.

Selection bias. Our users are not a representative sample of all tarot users globally. They are people who:

Found aimag.me through search, social, or referral
Speak one of our supported languages
Were comfortable using a web-based AI tarot tool
Self-selected into our funnel

Generalization to "all tarot users" is not warranted from this dataset.

Observational, not experimental. We don't randomize, we don't have a control group, we can't establish causation. We can describe patterns. We can't claim to explain them.

Update cadence

Quarterly snapshots: January, April, July, October. Published as a versioned dataset with anonymization audit notes.
Real-time aggregate stats: updated daily on this site (live counters, top cards, day-of-week distribution).
Per-reading data: never published in real-time. Always batched into quarterly anonymized snapshots.

Conflict of interest

The author of this research operates aimag.me, the AI tarot tool from which this data is collected. This is disclosed on every page. We have a financial interest in users finding tarot useful enough to subscribe to paid tiers.

To minimize bias from this conflict:

We publish data even when it's unflattering to AI tarot (e.g., the Major:Minor randomness finding directly undermines mystical claims)
We commit to publishing all quarterly snapshots regardless of what they show
We document and explain methodology changes whenever they happen
The dataset itself is open under Creative Commons license — anyone can run their own analysis and disagree with our interpretations

License

Published statistics on this site are released under Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).

Citation format:

aimag.me Tarot Reading Dataset (n=1,370). Collected 2026-01-01 to 2026-05-02. Anonymized open dataset. Available at aimag.me/research.

Questions

For methodology questions, dataset access requests, or replication queries: [email protected].

For RODO/GDPR-related data subject requests, see our Privacy Policy.

Research Methodology — How We Collect and Analyze Tarot Data