Recipe for Random Rows

Audrey and I are planning on presenting again at SQL Saturday 41 in Atlanta. I’ve decided to present on the different ways to handle slowly changing dimensions. The term Slowly Changing Dimension is associated with datawarehousing, but the general idea can be applied to any dataset, not just a dimension in a datawarehouse. In a nutshell, you are comparing a data source to a data target based on a key, determining which rows already exist in the target, updating the target on those rows, and loading as new inserts the rows from the source which do not exist in the target.

Rather lengthy intro to a post about an only semi-related topic, so I’ll cut to the chase. I needed a large dataset for my presentation. I was shooting for at least a million rows. I decided to go with Sales Reps and Regions, so I now needed many names for my fictional employees. I raided all the baby name sites on the internet for lists of first names. Then I googled for lists of common surnames. I crammed those names into some quick and dirty (Audrey would call “rogue”) tables and mixed them all up using a cross join into a new table. But my end result displeased me. It did not look “random” like it was loaded from a transactional database. The rows were remaining stubbornly alphabetized.

Out to GOOGLE I go. Apparently many folks have encountered this exact same quandary. Here was the solution I found right away. Sort by NEWID(). NEWID creates a UniqueIdentifier datatype. You can use it explicity, or as I’m about to do, you can use it in an order by statement to sort randomly.

So as Audrey stated in her very first blog, this is one of those things that has been out there for a while, but darn it all I didn’t know how to do it until a need for it came up in my life. I used this recipe to create a table with over a million rows.

Again for the sake of the children, I include my sql here:

Use tempdb;

create table FirstNames( FirstName varchar(50));
create table LastNames( LastName varchar(50));

create table FullNames( FirstName varchar(50), LastName varchar(50), FullName varchar(101));

–raid the internet for baby names and common surnames. There are plenty of choices. For my large set,
I used
about 1000 surnames and 1000 first names. Here I’ll do ten of each.
I literally copied the names off of pages and pasted them into text pad docs.
I then loaded them into staging tables with ssis, but you
could also use openrowset, or any other favorite method for cramming data into tables from text. */

insert into FirstNames(FirstName)

insert into LastNames(LastName)

–use a cross join (aka cartesian join) to produce one row for every first and last name combination
–possible between the two tables:

select fn.FirstName, ln.Lastname, fn.FirstName +’ ‘+ ln.Lastname as Fullname
from FirstNames fn
cross join LastNames ln

— boo very alphabetical! This does not look Random. They’re all gonna laugh at you Julie!

select fn.FirstName, ln.Lastname, fn.FirstName +’ ‘+ ln.Lastname, NEWID() as RandomnessMaker
from FirstNames fn
cross join LastNames ln
order by NEWID()
—showing you the value of the newid() for demo purposes, you don’t need to actually view or store it.

–The actual beautiful insert.
insert into FullNames (FirstName, LastName, FullName)
select fn.FirstName, ln.Lastname, fn.FirstName +’ ‘+ ln.Lastname
from FirstNames fn
cross join LastNames ln
order by NEWID()

—voila! Beautiful random names!
select * from FullNames