Six Amazing Libraries for Python

1. Description

        I've been using Python extensively for the past few years. So, I'm always on the lookout for amazing libraries that can enhance my work in data engineering and business intelligence projects. In the past, I've shared two articles, Five Cool Data Science  Python Libraries and Six Cool Python Libraries I've Come across Recently .

        In this article, I share six more amazing python libraries that I now use at work.

2. Humanize

        Humanize "provides simple, human-readable string formatting for numbers, dates, and times. The goal of the library is to take data and make it more human-friendly, for example, by converting seconds into a more readable string like "2 minutes ago"). The library can format data in a number of ways, including formatting numbers with commas, converting timestamps to relative times, and more.

I often work with integers and dates and times in data engineering projects.

2.1 installation

!pip install humanize

2.2 Example (integer)

# Importing library

import humanize
import datetime as dt

# Formatting  numbers with comma
a =  humanize.intcomma(951009)

# converting numbers into words
b = humanize.intword(10046328394)

#printing

print(a)
print(b)

  output

2.4 Example (date and time)

import humanize
import datetime as dt
    
a = humanize.naturaldate(dt.date(2012, 6, 5))
b = humanize.naturalday(dt.date(2012, 6, 5))

print(a)
print(b)

 output

        For more formatting options, check out here .

3、  Pendulum

        Although there are many libraries available for datetimes in Python, I found Pendulum to be easy to use for any manipulation with dates. Pendulum is my favorite library at work. It extends the built-in Python datetime module, adding a more intuitive API for working with timezones and performing operations on dates and times, such as adding intervals, subtracting dates, and converting between timezones. It provides a simple, human-friendly API to format dates and times.

3.1 Installation

!pip install pendulum 

3.2 Examples


# import library
import pendulum

dt = pendulum.datetime(2023, 1, 31)
print(dt)
 
#local() creates datetime instance with local timezone

local = pendulum.local(2023, 1, 31)
print("Local Time:", local)
print("Local Time Zone:", local.timezone.name)

# Printing UTC time
utc = pendulum.now('UTC')
print("Current UTC time:", utc)
 
# Converting UTC timezone into Europe/Paris time

europe = utc.in_timezone('Europe/Paris')
print("Current time in Paris:", europe)

output

        I need to write a separate blog for this library to show some examples here. For more formats, check here .

4. FTFY

        Have you ever encountered a situation where foreign languages ​​present in your data were not displayed correctly? This is called moji bak. Mojibake is a term used to describe garbled or gibberish text that occurs due to encoding or decoding problems. This usually happens when text written using one character encoding is incorrectly decoded using a different encoding. The ftfy python library will help you fix Mojibake, which is very useful in NLP use cases.

4.1 Installation

!pip install ftfy 

4.2 Examples

print(ftfy.fix_text('Correct the sentence using “ftfyâ€\x9d.'))
print(ftfy.fix_text('✔ No problems with text'))
print(ftfy.fix_text('à perturber la réflexion'))

 output

        In addition to Mojibake, ftfy will fix incorrect encodings, incorrect line endings and incorrect quotes. According to the documentation, ftfy can understand text decoded into any of the following encodings:

  • Latin-1 (ISO-8859–1)
  • Windows-1252 (cp1252 - for Microsoft products)
  • Windows-1251 (cp1251 — Russian version of cp1252)
  • Windows-1250 (cp1250 — Eastern European version of cp1252)
  • ISO-8859–2 (not exactly the same as Windows-1250)
  • MacRoman (used on Mac OS 9 and earlier)
  • cp437 (command prompt for MS-DOS and some versions of Windows)

        For more details, check out the documentation here .

5. Sketch

        This sketch is a unique AI coding assistant designed for users using the pandas library in Python. It leverages machine learning algorithms to understand the context of user data and provide relevant code suggestions to make data manipulation and analysis tasks easier and more efficient. Sketch doesn't require users to install any additional plugins in their IDE, so getting started is quick and easy. This can drastically reduce the time and effort required for data-related tasks and help users write better and more efficient code.

5.1 Installation

!pip install sketch 

5.2 Examples

        We need to add a .sketch extension to our Pandas data frame to use this library.

5.2.1 .sketch.ask

        ask is a feature of Sketch that allows users to ask questions about their data in a natural language format. It provides text-based responses to user queries.

# Importing libraries

import sketch
import pandas as pd

# Reading the data (using twitter data as an example)

df = pd.read_csv("tweets.csv")
print(df)

Image source: author

# Asking which columns are category type
df.sketch.ask("Which columns are category type?")

output

# To find the shape of the dataframe
df.sketch.ask("What is the shape of the dataframe")

5.2.2 .sketch.howto

        HowTo  is a feature that provides a block of code that can be used as a starting point or conclusion for various data-related tasks. We can ask code snippets to normalize their data, create new features, plot data, and even build models. This saves time and makes copying and pasting code easy; you don't need to write code manually from scratch.

# To find the shape of the dataframe
df.sketch.ask("What is the shape of the dataframe")

output

Image source: author

5.2.3 .sketch.howto

        The .apply  function helps generate new features, resolve fields, and perform other data manipulations. To use this feature, we need to have an OpenAI account and use an API key to perform tasks. I haven't tried this feature yet.

        I love using this library, especially how to do it, I find it useful.

        Check out this Github for more information on Sketch.

6. pgeocode

        "pgeocode" is an excellent library that I recently came across and it has been very useful for my spatial analysis projects. For example, it allows you to find the distance between two postal codes and provides geographic information by entering a country and postal code.

6.1 Installation

!pip install pgeocode

6.2 Examples

        Get geographic information for a specific zip code

# Checking for country "India"

nomi = pgeocode.Nominatim('In')

# Getting geo information by passing the postcodes

nomi.query_postal_code(["620018", "620017", "620012"])

output

        "PGEOCODE" calculates the distance between two postal codes by taking the country and postal code as input. The result is in kilometers.

# Finding a distance between two postcodes

distance = pgeocode.GeoDistance('In')

distance.query_postal_code("620018", "620012")

output

For more information, check here .

seven. rembg

rembg is another useful library for easily removing backgrounds from images.

7.1 Installation

!pip install rembg 

7.2 Examples

# Importing libraries 
from rembg import remove
import cv2 
# path of input image (my file: image.jpeg)
input_path = 'image.jpeg'
# path for saving output image and saving as a output.jpeg
output_path = 'output.jpeg'
# Reading the input image
input = cv2.imread(input_path)
# Removing background
output = remove(input)
# Saving file 
cv2.imwrite(output_path, output)

output

        You may already be familiar with some of these libraries, but for me Sketch, Pendulum, pgeocode, and ftfy are essential for my data engineering work. My project relies heavily on them.

 

You might also like my previous articles  Five Cool  Python Libraries  for Data Science and  Six Cool Python Libraries I've Met Recently

Guess you like

Origin blog.csdn.net/gongdiwudu/article/details/132006901