Teach you how to create a SQL database with Python~

Have you ingested your knowledge today?

Being able to write SQL is important, and being able to query a database efficiently is considered one of the most basic skills for a data analyst/scientist.

big data application

The School of Data Application was rated as the Top Data Camp in North America in 2016. It is the most professional one-stop data science consulting service organization, your data science job consulting expert!

4336 original content

No public

SQL is not only important, but very commonly used. According to the 2021 Stackoverflow Developer Survey, SQL is one of the five most used programming languages. So, we should invest more time to learn SQL.

Character illustrations by Storyset

But there is a question: how to practice database queries without a database?

In today's article, let's tackle this basic problem and learn how to create your own MySQL database from scratch. With the help of Python and some external libraries, we'll create a simple script that automatically creates and populates our table with randomly generated data.

However, before discussing the implementation details, we first need to discuss some prerequisites.

Note: Of course there are other ways to get a SQL database for practice (such as direct download) , but using Python and some external libraries can provide us with additional and valuable practice opportunities.

prerequisites

Let's start with the basics first.

First, you need to install MySQL Workbench and connect to the service, then you can start building the database:

CREATE DATABASE IF NOT EXISTS your_database_name;

Now, we just need to install the necessary python libraries and the basic setup is done. The library we will be using is shown below and can be easily installed via the terminal.

  • 1. NumPy:  pip install numpy

  • 2. Sqlalchemy:  pip install sqlalchemy

  • 3. Faker:  pip install faker

Create script

After completing the basic setup, we can start writing python scripts.

Start by creating a class with some boilerplate code to give us a blueprint to guide us through the rest of the implementation.

import numpy as np

import sqlalchemy

from faker import Faker

from sqlalchemy import Table, Column, Integer, String, MetaData, Date,

class SQLData:

    def __init__(self, server:str, db:str, uid:str, pwd:str) -> None:

        self.__fake = Faker()

        self.__server = server

        self.__db = db

        self.__uid = uid

        self.__pwd = pwd

        self.__tables = dict()

    def connect(self) -> None:

        pass

    def drop_all_tables(self) -> None:

        pass

    def create_tables(self) -> None:

        pass

    def populate_tables(self) -> None:

        pass

We haven't used a particularly advanced syntax yet. We basically just created a class, stored the database credentials for later use, imported the library, and defined some methods.

establish connection

The first thing we want to do is create a database connection.

Fortunately, we can utilize the python library sqlalchemy to do most of the work.

​
class SQLData:

    #...

    def connect(self) -> None:

        self.__engine = sqlalchemy.create_engine(

            f"mysql+pymysql://{self.__uid}:{self.__pwd}@{self.__server}/{self.__db}"

        )

        self.__conn = self.__engine.connect()

        self.__meta = MetaData(bind=self.__engine)

This method can create and store 3 objects as instance properties.

First, we create a connection that serves as the starting point for the sqlalchemy application, describing how to talk to a specific type of database/DBAPI combination.

In our case, we specify a MySQL database and pass in our credentials.

Next, create a connection that will allow us to execute SQL statements and a metadata object (a container) that brings together the different functions of the database and lets us associate and access database tables.

Create a table

Now, we need to create the database tables.


class SQLData:

    #...

    def create_tables(self) -> None:

        self.__tables['jobs'] = Table (

            'jobs', self.__meta,

            Column('job_id', Integer, primary_key=True, autoincrement=True, nullable=False),

            Column('description', String(255))

        )

        self.__tables['companies'] = Table(

            'companies', self.__meta,

            Column('company_id', Integer, primary_key=True, autoincrement=True, nullable=False),

            Column('name', String(255), nullable=False),

            Column('phrase', String(255)),

            Column('address', String(255)),

            Column('country', String(255)),

            Column('est_date', Date)

        )

        self.__tables['persons'] = Table(

            'persons', self.__meta,

            Column('person_id', Integer, primary_key=True, autoincrement=True, nullable=False),

            Column('job_id', Integer, ForeignKey('jobs.job_id'), nullable=False),

            Column('company_id', Integer, ForeignKey('companies.company_id'), nullable=False),

            Column('last_name', String(255), nullable=False),

            Column('first_name', String(255)),

            Column('date_of_birth', Date),

            Column('address', String(255)),

            Column('country', String(255)),

            Column('zipcode', String(10)),

            Column('salary', Integer)

        )

        self.__meta.create_all()

We created 3 tables and stored them in a dictionary for later reference.

Creating tables in sqlalchemy is also very simple. We just instantiate a new table, provide the table name, the metadata object, and specify the different columns.

In this example, we created a job table, a company table and a person table. The person table also links other tables via foreign kkeys, which makes the database more interesting in practicing SQL joins.

After all the tables are defined, we simply call the create_all() method of the MetaData object.

generate some random data

Although we created the database tables, there is still no data available. So we need to generate some random data and insert it into the table.


class SQLData:

    #...

    def populate_tables(self) -> None:

        jobs_ins = list()

        companies_ins = list()

        persons_ins = list()

        for _ in range(100):

            record = dict()

            record['description'] = self.__fake.job()

            jobs_ins.append(record)

        for _ in range(100):

            record = dict()

            record['name'] = self.__fake.company()

            record['phrase'] = self.__fake.catch_phrase()

            record['address'] = self.__fake.street_address()

            record['country'] = self.__fake.country()

            record['est_date'] = self.__fake.date_of_birth()

            companies_ins.append(record)

        for _ in range(500):

            record = dict()

            record['job_id'] = np.random.randint(1, 100)

            record['company_id'] = np.random.randint(1, 100)

            record['last_name'] = self.__fake.last_name()

            record['first_name'] = self.__fake.first_name()

            record['date_of_birth'] = self.__fake.date_of_birth()

            record['address'] = self.__fake.street_address()

            record['country'] = self.__fake.country()

            record['zipcode'] = self.__fake.zipcode()

            record['salary'] = np.random.randint(60000, 150000)

            persons_ins.append(record)

        self.__conn.execute(self.__tables['jobs'].insert(), jobs_ins)

        self.__conn.execute(self.__tables['companies'].insert(), companies_ins)

        self.__conn.execute(self.__tables['persons'].insert(), persons_ins)

Now, we can utilize the Faker library to generate random data.

We simply use the randomly generated data in a for loop to create a new record represented by a dictionary. The single record is then appended to a list that can be used in (multiple) insert statements.

Next, call the execute() method from the connection object, passing the list of dictionaries as an argument.

That's it! We have successfully implemented the class - just instantiate the class and call the relevant functions to create the database.

if __name__ == '__main__':

    sql = SQLData('localhost','yourdatabase','root','yourpassword')

    sql.connect()

    sql.create_tables()

    sql.populate_tables()

try doing a query

The only thing left is - we need to verify that our database is up and running and does indeed contain some data.

Start with a basic query:


SELECT *
FROM jobs
LIMIT 10;

Basic query results [Image by author]

It looks like our script succeeded and we have a database with actual data.

Now, try a more complex SQL statement:


SELECT
  p.first_name,
  p.last_name,
  p.salary,
  j.description
FROM
  persons AS p
JOIN
  jobs AS j ON
  p.job_id = j.job_id
WHERE
  p.salary > 130000
ORDER BY
  p.salary DESC;

 

This result looks plausible - we can say that our database is functioning properly.

in conclusion

In this article, we learned how to utilize Python and some external libraries to create our own practice database with randomly generated data.

While it's easy to download an existing database to start practicing SQL, creating your own database from scratch in Python provides additional learning opportunities. Since SQL and Python are often closely linked, these learning opportunities can be particularly useful.

end

This is the end of this sharing~ I hope it can be of some help to you! ! If you like it, remember to give the editor a three-line follow-up ‍♀️

The support of the family is the biggest motivation for the editor to update

If you want to get more complete source code and Python learning materials, you can click this line of fonts

 

Guess you like

Origin blog.csdn.net/L010409/article/details/123272232