It's not so hard to find sample data and data sources to use for interesting side-projects, or just for practicing writing SQL.
Most DBMSes come with sample databases. You can write lots of interesting queries against them, and usually a tutorial accompanies the database in the documentation.
- Documentation for Microsoft SQL Server's samples ** Microsoft's sample database GitHub, which includes the Contoso database
- For MySQL:
- there's the Employees sample database
- and the Sakila sample database
- For PostgreSQL:
- there are several sample DBs in the PostgreSQL wiki
- there's a link tree of other samples and exercises, too
- a GitHub repository with a collection of PostgreSQL samples from the old pgfoundry site
- Oracle publishes a manual section about there sample databases
Some websites are full of sample data sets. Why not download an interesting one, learn to load it up, and write your own interesting queries?
There are many websites which host data sets.
- Kaggle.com is full of sample data!
- FiveThirtyEight.com has lots of neat data sets
- The github awesomedata repository has a collection of interesting data sets
- Wikipedia has a list of datasets for machine learning research
Of course, some sample data is built for generic tutorials, by third parties:
- SqlSkills.com publishes sample databases for SQL Server, which include some corrupt databases so you can practice recovery operations
- SQLTutorial.com's Sample Database is available for sseveral vendors
There are some sites that let you write queries interactively with canned data, rather than having you download data to play with on your own.
- I haven't used it, but I've seen people recommend SqlZOO.net
- LearnSQL.com has a blog post called "Learning SQL? 12 Ways to Practice SQL Online" with lots of resources.
- Sylvia Moestl Vasilik's website (which supports their book) has almost 60 practice problems.
Some sites publish data by making their backups available, or dumping the data they use to make their own reports.
- Wikipedia publishes all of the content of Wikipeida as SQL scripts for MySQL, plus as XML files. You can get that data (or subsets of it) and play around.
- StackOverflow makes their developer survey data sets available each year. ** You can also get a StackOverflow "demo" database that includes text of questions and answers
- Some governments make data about the city and its residents available openly:
- London Open Data
- New York City Open Data
- Seattle Open Data
- Tokyo open data (in Japanese, obviously)
- Find open data at data.gov.uk
- IMDb makes several data sets available
Some data sources produce data live, as it happens. These are itneresting sources becaue they usually represent slowly changing dimensions, and will need to be accumulated or logged before being stored or processed.
- Wikipedia Event Streams can show edits that are happening on Wikipedia, as they happen.
- The TWitter API provides a way to stream a subset of all tweets in realtime.
- General Transit Feed Specification (GTFS) data is provided by many metropolitain areas to describe movement of their transportation infrastructure; where are scheduled busses and trains right now?
- In the New York City area, the MTA provides GTFS data.
- You can find GTFS feeds for Seattle, and their live data through other APIs.
- Tokyo (and other municipalities in Japan) have hosted transit data challenges to encourage use of their data.
- Some games make gameplay data available in realtime. SuperCell's Clash Royale, for example, has a gameplay API.
There's data everywhere! If you don't like these sources, you can try finding other data sets.
- Once you know the protocol or format, search for it! The OneBusAway API and GTFS protocols are about public transportation data, so earch for "GTFS Data {YourCity}".
- Search for APIs for your favortie game or game server.
- GitHub uses tags for search, so try #sample-databases, #opendata, or #datasets. What other tags can you find?