Generate scatter plots for analyzing relationships between two variables. Part of the DevTools Surf developer suite. Browse more tools in the Data / SQL collection.
Use Cases
Visualize the relationship between two continuous variables (price vs. size, age vs. income).
Detect outliers in a dataset before running regression or clustering.
Compare correlation strength across multiple variable pairs.
Show clustering structure before applying k-means or other clustering algorithms.
Tips
Add a regression line and confidence band to make the relationship direction and uncertainty explicit rather than leaving it to visual inference.
Use opacity (alpha blending) for overlapping points in dense datasets — solid dots at 100% opacity make overplotting invisible.
Log-transform axes when data spans multiple orders of magnitude — linear scales will compress most points into the lower-left corner.
Fun Facts
The scatter plot was first used systematically by John Frederick W. Herschel in the 1830s to plot astronomical data and estimate orbital parameters.
Francis Galton used scatter plots in the 1870s–1880s to discover the concept of regression to the mean, making scatter plots central to the origin of modern statistics.
Anscombe's Quartet (1973) famously showed four datasets with identical summary statistics (mean, variance, correlation) that look completely different on a scatter plot — illustrating why visualization is essential before analysis.
FAQ
What does a scatter plot show that a correlation coefficient does not?
Non-linear relationships, heteroscedasticity (variance that changes across the range), outlier clusters, and multiple distinct groups — all of which a single r value obscures.
How many points are too many for a scatter plot?
Above 5,000–10,000 points, overplotting makes individual points invisible. Use binned scatter (hex bin or contour density) for large datasets to show density distribution instead of individual points.