2023 Google Developer Conference – Technical updates in the field of AI

Conference introduction

The Google Developer Conference is an annual event where Google showcases its latest products and platforms to developers and technology enthusiasts. The 2023 Google Developers Conference (Google I/O Connect | China) provides developers with rich learning resources, practical operations and on-site demonstrations, and provides opportunities to interact with Google experts and communicate with other developers to help improve development efficiency and release the team. Creativity, simplifying work processes, serving developers with open integrated solutions, jointly building an innovation ecosystem, and opening up a better future. 

Easily implement on-device machine learning with MediaPipe

Media Pipe and on-device machine learning

Media Pipe is a low-code/no-code framework for building and deploying cross-platform device-side machine learning solutions. It can integrate machine learning into your mobile, web and IoT applications.

On-device machine learning is a type of machine learning that runs on the user's device, such as a smartphone or web browser, without sending user data to a server for processing. 

Media Pipe solves gesture recognition problem

It takes an image as input and returns the gesture found in the image, like: thumbs up. This task actually requires connecting four different machine learning models together:

  • ① Check the hand from the image
  • ②Detect key points of hand
  • ③Create the embedding vector of the gesture
  • ④Category this embed as a thumbs up

In addition, many other things can be achieved. For example: how to run the entire process efficiently on GPU or not familiar with different platforms.

But MediaPipe abstracts these complex tasks and provides a pipeline to connect the models together for you, so we don't need to worry about coordinating all these models, only simple API interaction is required.

What platforms does MediaPipe currently support?

Currently supporting Android, Web and Python, IOS support will be launched soon.

MediaPipe Studio

MediaPipe Studio is a web application that allows you to try all of MediaPipe's device-side machine learning solutions directly in the browser. For gesture recognition, two-hand gesture support is being added and will be launched soon. In addition to gesture recognition, MediaPipe Studio also provides other machine learning-related solutions, such as image segmentation, facial recognition, text and audio and video solutions, etc.

Customize solutions to fit your own use cases

This problem can be solved using MediaPipe Model Maker, which was built from the ground up to be a native library for customizing the solutions provided by MediaPipe.

Taking solving the gesture recognition problem as an example, the steps are as follows:

  1. Collect a training data set that uses hands to make three gestures: rock paper and scissors.
  2. Once you have the dataset, you can start using Model Maker to train a custom model to recognize these gestures
  3. Train models faster with Google Colab’s free GPU
  4. First you need to import the gesture recognizer module
  5. Then load the rock-paper-scissors dataset and start training the custom model
  6. The accuracy of the model can be checked using a test data set that the model has not seen during training.
  7. Finally you can export it for deployment on the device using MediaPipe Tasks

MediaPipe Studio allows us to try these solutions in a web browser to get
inspiration for integrating on-device machine learning into applications, and many of them can be customized with your own data sets.

Developments in the field of AI models

  • One is that a technique called model distillation has been discovered that can be used to refine these general large models into smaller models that can be run on the device and specialized in handling a certain number of tasks, and then be used to predict the faces in the image. An experimental on-device solution for stylization, you can use it to transform your photos into a cartoon style, for example.

  • The second is another generative Al model being tested, which is a diffusion-based image generation model. MediaPipe provides us with a ready-made device-side machine learning solution that can be easily integrated into your mobile or web application. Generating images from text prompts in seconds has been implemented on Android phones.
  • The third is that large language models can be deployed on Android to help you complete some tasks with natural language, such as summarizing a long conversation or composing a formal email based on a given topic, although device-side generative AI is still in its early stages. , but it will get better and better in the near future.

summary

 The 2023 Google Developer Conference showed us many technologies. Media Pipe abstracts the complex work of machine learning and provides a pipeline to help you connect models together. Developers can use this product suite to easily connect device-side machines to Learning solutions are integrated into applications on different platforms (Android, Web, desktop, etc.). At the same time, AI models can be initially deployed on Android. More and more AI models can help us in daily life. We can also easily customize parts of the solution using our own training data sets with just a few lines of code, so in the future we may be able to achieve "zero code"!

Friends who are interested in MediaPipe , machine learning or other development tools can go to the CSDN special page to watch the replay videos of the keynote speeches and special speeches of the 2023 Google Developer Conference, and learn more about new technological knowledge and cutting-edge cases. CSDN

Guess you like

Origin blog.csdn.net/weixin_53197693/article/details/132735450