Have you ingested your knowledge today?
Being able to write SQL is important, and being able to query a database efficiently is considered one of the most basic skills for a data analyst/scientist.
big data application
The School of Data Application was rated as the Top Data Camp in North America in 2016. It is the most professional one-stop data science consulting service organization, your data science job consulting expert!
4336 original content
No public
SQL is not only important, but very commonly used. According to the 2021 Stackoverflow Developer Survey, SQL is one of the five most used programming languages. So, we should invest more time to learn SQL.
Character illustrations by Storyset
But there is a question: how to practice database queries without a database?
In today's article, let's tackle this basic problem and learn how to create your own MySQL database from scratch. With the help of Python and some external libraries, we'll create a simple script that automatically creates and populates our table with randomly generated data.
However, before discussing the implementation details, we first need to discuss some prerequisites.
Note: Of course there are other ways to get a SQL database for practice (such as direct download) , but using Python and some external libraries can provide us with additional and valuable practice opportunities.
prerequisites
Let's start with the basics first.
First, you need to install MySQL Workbench and connect to the service, then you can start building the database:
CREATE DATABASE IF NOT EXISTS your_database_name;
Now, we just need to install the necessary python libraries and the basic setup is done. The library we will be using is shown below and can be easily installed via the terminal.
-
1. NumPy: pip install numpy
-
2. Sqlalchemy: pip install sqlalchemy
-
3. Faker: pip install faker
Create script
After completing the basic setup, we can start writing python scripts.
Start by creating a class with some boilerplate code to give us a blueprint to guide us through the rest of the implementation.
import numpy as np
import sqlalchemy
from faker import Faker
from sqlalchemy import Table, Column, Integer, String, MetaData, Date,
class SQLData:
def __init__(self, server:str, db:str, uid:str, pwd:str) -> None:
self.__fake = Faker()
self.__server = server
self.__db = db
self.__uid = uid
self.__pwd = pwd
self.__tables = dict()
def connect(self) -> None:
pass
def drop_all_tables(self) -> None:
pass
def create_tables(self) -> None:
pass
def populate_tables(self) -> None:
pass
We haven't used a particularly advanced syntax yet. We basically just created a class, stored the database credentials for later use, imported the library, and defined some methods.
establish connection
The first thing we want to do is create a database connection.
Fortunately, we can utilize the python library sqlalchemy to do most of the work.
class SQLData:
#...
def connect(self) -> None:
self.__engine = sqlalchemy.create_engine(
f"mysql+pymysql://{self.__uid}:{self.__pwd}@{self.__server}/{self.__db}"
)
self.__conn = self.__engine.connect()
self.__meta = MetaData(bind=self.__engine)
This method can create and store 3 objects as instance properties.
First, we create a connection that serves as the starting point for the sqlalchemy application, describing how to talk to a specific type of database/DBAPI combination.
In our case, we specify a MySQL database and pass in our credentials.
Next, create a connection that will allow us to execute SQL statements and a metadata object (a container) that brings together the different functions of the database and lets us associate and access database tables.
Create a table
Now, we need to create the database tables.
class SQLData:
#...
def create_tables(self) -> None:
self.__tables['jobs'] = Table (
'jobs', self.__meta,
Column('job_id', Integer, primary_key=True, autoincrement=True, nullable=False),
Column('description', String(255))
)
self.__tables['companies'] = Table(
'companies', self.__meta,
Column('company_id', Integer, primary_key=True, autoincrement=True, nullable=False),
Column('name', String(255), nullable=False),
Column('phrase', String(255)),
Column('address', String(255)),
Column('country', String(255)),
Column('est_date', Date)
)
self.__tables['persons'] = Table(
'persons', self.__meta,
Column('person_id', Integer, primary_key=True, autoincrement=True, nullable=False),
Column('job_id', Integer, ForeignKey('jobs.job_id'), nullable=False),
Column('company_id', Integer, ForeignKey('companies.company_id'), nullable=False),
Column('last_name', String(255), nullable=False),
Column('first_name', String(255)),
Column('date_of_birth', Date),
Column('address', String(255)),
Column('country', String(255)),
Column('zipcode', String(10)),
Column('salary', Integer)
)
self.__meta.create_all()
We created 3 tables and stored them in a dictionary for later reference.
Creating tables in sqlalchemy is also very simple. We just instantiate a new table, provide the table name, the metadata object, and specify the different columns.
In this example, we created a job table, a company table and a person table. The person table also links other tables via foreign kkeys, which makes the database more interesting in practicing SQL joins.
After all the tables are defined, we simply call the create_all() method of the MetaData object.
generate some random data
Although we created the database tables, there is still no data available. So we need to generate some random data and insert it into the table.
class SQLData:
#...
def populate_tables(self) -> None:
jobs_ins = list()
companies_ins = list()
persons_ins = list()
for _ in range(100):
record = dict()
record['description'] = self.__fake.job()
jobs_ins.append(record)
for _ in range(100):
record = dict()
record['name'] = self.__fake.company()
record['phrase'] = self.__fake.catch_phrase()
record['address'] = self.__fake.street_address()
record['country'] = self.__fake.country()
record['est_date'] = self.__fake.date_of_birth()
companies_ins.append(record)
for _ in range(500):
record = dict()
record['job_id'] = np.random.randint(1, 100)
record['company_id'] = np.random.randint(1, 100)
record['last_name'] = self.__fake.last_name()
record['first_name'] = self.__fake.first_name()
record['date_of_birth'] = self.__fake.date_of_birth()
record['address'] = self.__fake.street_address()
record['country'] = self.__fake.country()
record['zipcode'] = self.__fake.zipcode()
record['salary'] = np.random.randint(60000, 150000)
persons_ins.append(record)
self.__conn.execute(self.__tables['jobs'].insert(), jobs_ins)
self.__conn.execute(self.__tables['companies'].insert(), companies_ins)
self.__conn.execute(self.__tables['persons'].insert(), persons_ins)
Now, we can utilize the Faker library to generate random data.
We simply use the randomly generated data in a for loop to create a new record represented by a dictionary. The single record is then appended to a list that can be used in (multiple) insert statements.
Next, call the execute() method from the connection object, passing the list of dictionaries as an argument.
That's it! We have successfully implemented the class - just instantiate the class and call the relevant functions to create the database.
if __name__ == '__main__':
sql = SQLData('localhost','yourdatabase','root','yourpassword')
sql.connect()
sql.create_tables()
sql.populate_tables()
try doing a query
The only thing left is - we need to verify that our database is up and running and does indeed contain some data.
Start with a basic query:
SELECT *
FROM jobs
LIMIT 10;
Basic query results [Image by author]
It looks like our script succeeded and we have a database with actual data.
Now, try a more complex SQL statement:
SELECT
p.first_name,
p.last_name,
p.salary,
j.description
FROM
persons AS p
JOIN
jobs AS j ON
p.job_id = j.job_id
WHERE
p.salary > 130000
ORDER BY
p.salary DESC;
This result looks plausible - we can say that our database is functioning properly.
in conclusion
In this article, we learned how to utilize Python and some external libraries to create our own practice database with randomly generated data.
While it's easy to download an existing database to start practicing SQL, creating your own database from scratch in Python provides additional learning opportunities. Since SQL and Python are often closely linked, these learning opportunities can be particularly useful.
end
This is the end of this sharing~ I hope it can be of some help to you! ! If you like it, remember to give the editor a three-line follow-up ♀️
The support of the family is the biggest motivation for the editor to update