Kangas: Pandas in Computer Vision

introduce

In the field of computer vision, Kangas is an increasingly popular tool for image data processing and analysis. Similar to how Pandas changed the way data analysts work with tabular data, Kangas has done the same for computer vision tasks.

Kangas is an open source Comet ML tool for exploring, analyzing, and visualizing large-scale multimedia datasets such as images, video, and audio. Kangas enables machine learning professionals to visualize, sort, group, query and interpret their data (structured or unstructured) to gain meaningful insights and accelerate model development.

On the other hand, Pandas is a popular open-source Python library for data analysis and manipulation of tabular data. It can also be used for data cleaning and preparation. It is easy to use, fast and flexible compared to other libraries, but does not natively support unstructured data types like Kangas.

Kangas is to computer vision data what Pandas is to tabular data. Kangas provides methods for reading, manipulating, and analyzing images, as we'll see in several examples throughout this tutorial.

Advantages of Kangas

    • Ease of use: The main advantage of Kangas is that it simplifies the process of working with computer vision data. It has a user-friendly API, data professionals can quickly load, process and analyze visual data without writing complex codes. This makes it easier for data professionals to focus on the task at hand rather than the technical details of data processing.

    • Speed ​​and efficiency: Compared with other computer vision tools, Kangas can easily handle large data sets and process them quickly, enabling real-time data analysis and decision-making. This makes it ideal for use in time-sensitive applications, such as autonomous vehicles, where fast and accurate analysis of vision data is critical.

    • Diversity: Kangas provides a wide range of machine learning algorithms that can be applied to computer vision tasks. These algorithms can be used to perform tasks such as image classification, object detection, and image segmentation.

    • Ability to handle large amounts of data: Kangas uses memory-efficient data structures that enable data professionals to process large amounts of image and video data with outstanding performance. This makes it ideal for processing high-resolution image and video data.

    • Flexibility: Kangas can run in multi-platform applications such as Jupyter notebooks, standalone applications or web applications.

Read CSV files with Kangas

Reading data from CSV files is very similar in Kangas and Pandas. The difference is that Kangas creates a DataGrid whereas Pandas creates a DataFrame. The code below shows how to read data from a CSV file into a DataGrid:

import kangas as kg
dg = kg.read_csv("path_to_csv_file")

This compares to the code in Pandas for reading a CSV file:

import pandas as pd
df = pd.read_csv("path_to_csv_file")

Next, we will visualize the data in the CSV file using the following code:

dg.show()

output:

9f07a75359bd35e90691b548688ac0ec.png

Kangas visualization of CSV data files

Compared to Pandas' syntax:

df.head()

Note that Kangas' DataGrid is interactive whereas Pandas' DataFrame is static.

read image file

Unlike other computer vision image libraries like OpenCV, reading image files with Kangas takes advantage of the simplicity of Pandas to ensure data scientists focus their efforts where they need to be.

To read an image file with Kangas, run the following code block:

import kangas as kg
image = kg.Image("path_to_images").to_pil()

Visualize the image file by running the variable name "image" in the following code:

image

output:

b143c60ec0588cb163c7584250e80d1a.png

Potato image displayed using Kangas

From the above example, you can see that the syntax of Kangas is very similar to Pandas.

Similarities Between Pandas and Kangas

    • Syntax: Kangas and Pandas have similar syntax and are easy to write and use.

    • Data processing: Both Kangas and Pandas have data processing functions. Both can read data in any format from CSV, JSON to XLSX (Excel) files. Kangas uses DataGrid while Pandas uses DataFrame and Series to store data.

    • Data manipulation: Both Kangas and Pandas allow users to filter, sort, merge, and reshape data, but Kangas does it interactively.

    • Indexing: Both libraries allow users to index and select data based on tags or criteria. In Pandas, the loc and iloc methods are used to operate, while in Kangas it is operated from the DataGrid.

    • Data Analysis: Both libraries provide basic data analysis methods such as descriptive statistics, aggregation and grouping operations.

Difference Between Kangas and Pandas

    • Kangas can handle image files, while Pandas cannot.

    • Kangas provides a user interface for DataGrid for data manipulation whereas Pandas only allows programmatic manipulation.

Create Kangas

DataGrid Kangas DataGrid is an open source SQLite database that provides the ability to store and display large amounts of data and perform fast and complex queries. DataGrid can be saved, shared, and even remotely served.

Some key features of Kangas DataGrid include:

    • Lazy Loading: Kangas DataGrid only loads data when needed, ideal for displaying large datasets.

    • Filtering and sorting: Users can filter and sort the data displayed in the grid based on various criteria.

    • Cell Editing: Users can edit individual cells in the grid, and those changes can be saved back to the underlying data source.

    • Column resizing and reordering: Users can resize and reorder columns in the grid.

    • Virtual Scrolling: Kangas DataGrid supports virtual scrolling, meaning only visible rows are rendered in the DOM, which significantly improves performance.

Kangas DataGrid is easy to customize and configure, allowing developers to tailor its design and functionality to the needs of their specific application.

Creating a Kangas DataGrid is relatively straightforward for tabular data, but not for image data. For tabular data, simply read the CSV file using Kangas to create a DataGrid like so:

dg = kg.read_csv("/path_to_csv_file")
dg.show()

For image data, here is the step-by-step procedure for creating a DataGrid:

    • First, collect data or download it from a data repository such as Kaggle. Split the data into x_train, x_test, y_train and y_test partitions.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size=0.2,
                                                random_state=42)
    • Next, train the model.

from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.applications.mobilenet import MobileNet


# Define the model
model = Sequential([MobileNet(include_top=False,
                                      input_shape=(150, 150, 3),
                                      weights="imagenet",
                                      pooling='avg',
                                      classes=1000),
                    Dense(128, activation='relu'),
                    Dropout(0.25), Dense(1, activation='sigmoid')
                   ])


model.summary()


# compile model
model.compile(
          loss='categorical_crossentropy',
          optimizer='adam',
          metrics=['accuracy']
)


# fit the model
batch_size = 20
classifier = model.fit(


    X_train, y_train,
    steps_per_epoch=train_samples // batch_size,
    epochs=10,
    validation_data=(X_test, y_test),
    validation_steps=validation_samples // batch_size)
    • Create and save a Kangas DataGrid.

from kangas import DataGrid, Image


dg = DataGrid(
    name="potato-tuber",
    columns=[
        "Epoch",
        "Index",
        "Image",
        "Truth",
        "Output",
        "score_0",
        "score_1",
        "score_2",             
    ],
)


# Make image of the test set for reuse
images = [Image(test, shape=(28, 28)) for test in X_test]


# Do it once before training:
outputs = model.predict(X_test)
epoch = 0
for index in range(len(X_test)):
  truth = int(y_test[index].argmax())
  guess = int(outputs[index].argmax())
  dg.append([epoch, index, images[index], truth, guess] + list(outputs[index]))




dg.save()
    • Explore and share DataGrid.

After creating the DataGrid, access the path where the DataGrid is saved and copy the path. Run the following command to explore the created DataGrid:

kg.show('/path_to_datagrid/')

output:

4a01172ac2ebde1599e551b05b0085bf.png

in conclusion

Kangas is on its way to becoming the Pandas of computer vision data processing and analysis. Its user-friendly API, speed, efficiency, and ease of use make it an invaluable tool for data scientists and computer vision experts. Whether you're working on a cutting-edge autonomous driving project or simply analyzing data for research purposes, Kangas is the perfect tool to get the job done.

·  END  ·

HAPPY LIFE

ef77ad2cc22045bd4abbc4fdfdfb14f5.png

Guess you like

Origin blog.csdn.net/weixin_38739735/article/details/130776454