For my master’s project, I was using a dataset which provided me with a large collection of LLM output text, alongside a numerical label indicating how significant of a hallucination was contained in the text (the consistency rating) for some of the ...
read more →
Dirty Crime Data: A Case Study of the Chicago Crime Dataset
Even Crime Data Can’t Stay Clean
We pulled the Chicago Crime dataset straight from the city’s open data portal using a simple curl command. It’s public, it’s official, and like most real-world data it’s messier than it looks.
curl -L -o Chicago_Crim ...
read more →
Give a dog a bad name
This is a test post.
We at Fancy Company treat our data very badly.
For example, we never give names to variables
x
y
z
1
2
3
But what are x y and z?
and then forget it
don't know, don't remember it
read more →