Summary of recent AI information articles (for personal use)

Article directory

1. AIGC

1.1 AIGC

AIGC definition - a collective term for the production, manipulation and modification of data or media through artificial intelligence algorithms.
"The Document is All You Need!" One-stop introductory guide to the underlying technical principles of LLM》
datawhale AIGC application high-quality case library
Full access! ChatGPT has entered 15 commercial applications and let AI work for you (2023.5.9)
"Deep Thinking: In the AI era, what is your ability that will be amplified a thousand times?" 》 (2023.5.8)
"AIGC Unified Model: Almighty Diffusion" : paper , code : unified multi-stream multi-modal Diffusion framework

In addition to the ordinary text-generating image function, you can also input images to generate similar images, input images to generate text, input text to generate similar text, image semantic decoupling editing, input images and text to generate videos, edit image content based on latent space, etc. . Future versions will also support more modes such as voice, music, video and 3D. According to the paper, it has been proven that VD and its underlying framework have the following advantages:
- All subtasks can be processed with competitive high quality.
- Support new extensions and applications, such as the separation of graphic style and semantics, image-text dual guidance generation, etc.
- These experiments and applications provide richer semantic insights into the generated output.

1.2 ChatGPT series

Code Interpreter: "The most powerful application launched on ChatGPT: proficient in analyzing data and generating code"
"ChatGPT User Manual"
Andrew Ng’s new class is here again, teaching you step by step how to build applications using ChatGPT API!
Crushing LLaMA, "Falcon" is completely open source! 40 billion parameters, trillions of token training, dominating Hugging Face
How to quickly and cost-effectively train AIGC models in the private sector?
Materia Medica [original name: HuaTuo]: LLaMA fine-tuning model based on Chinese medical knowledge
AtuoGPT: AutoGPT fool-proof usage tutorial + real experience! As well as articles , free trials gitpod.
BingChat "Microsoft Bing suddenly has an explosive update! No need to wait for everyone to be able to use it, the answers are illustrated and written"
MetaGPT : This is a multi-agent framework capable of playing different roles, an engineer, product manager, architect and project manager. Similar to AutoGPT, but tailored for product requirements, design, competitive analysis, API and documentation. Just input the boss's requirement in one sentence, and it can output the entire process of the software company, as well as the carefully arranged SOP.

1.3 Google Series

On May 11, 2023, a variety of products were released at the 2023 Google I/O conference . For details, see "Google's Comprehensive Counterattack on ChatGPT!" PaLM 2, Gemini double kill, Bard officially opened》 :
- Officially released a large model with 540 billion parameters PaLM 2, which has empowered 25 Google products.
- Announced its next generation multi-modal basic model Gemini, DeepMindwhich is still in the training process. The model supports multiple input methods and includes efficient tools that allow developers to implement API integration.
- Google’s smart chatbot Bard is now fully open to the public
- Google Office Suite Workspacelaunches AIGC toolkit - Duet AInow open for trial (compared to Copilot)
Officially launched, GitHub opens new code search engine to all users

1.4 CV

1.4.1 Stable Diffusion

1.4.2 Smart CV

Photoshot generates exclusive avatar
Lama Cleaner realizes image processing with one click
AutoCut : Quickly cut videos with subtitles

1.5 Code generation

AI code generator CodeWhisperer: Autumn leaves introduction video , official website registration
codeinterpreter-api : Open source implementation of the ChatGPT code interpreter
CodeGeeX2-6B : CodeGeeX2 is the second generation model of the multi-language code generation model CodeGeeX, which is implemented by injecting code based on the ChatGLM2 architecture.
Copilot Chat: VSCode’s latest plug-in, GitHub Copilot’s new feature—Copilot ChatAI chat assistant is coming!

1.6 Domestic

"ChatGPT, Academic Professional Edition of the Chinese Academy of Sciences" : The project is aimed at the daily scientific research work of the Chinese Academy of Sciences. It has customized a set of practical functions based on ChatGPT to optimize academic research and develop daily workflow. The built-in tools include but are not limited to the following: one-click polishing of academic papers, grammatical error search; fast translation between Chinese and English; one-click code explanation; shortcut key customization; high-level experimental modular design; project source code self-analysis ; Intelligently read papers and generate abstracts.
Llama2-Chinese : An advanced technical community focusing on the optimization and upper-layer construction of the Llama model in Chinese, including Chinese corpus, model deployment, fine-tuning, etc.
Colossal-AI : The world's largest and most active large model development tool and community, providing out-of-the-box LLaMA2 training, fine-tuning, and inference solutions from 8 to 512 cards, accelerating 70 billion parameter training by 195%, and providing one-stop Cloud platform solutions greatly reduce the cost of large model development and implementation.

1.7 Google plug-in

"ChatGPT plug-in recommended, efficiency increased by 100 times!" 》
Immersive translation : The most useful translation plug-in at present, it can perform one-click translation on various web pages while retaining the original English text, making it convenient for you to compare Chinese and English. Even the English subtitles in the video can be translated.
- Export bilingual e-books with one click and support real-time bilingual translation of PDF, subtitles, TXT and other files.
- Innovative mouse-over translation - Just hover your mouse over any paragraph on any web page and the corresponding translation will immediately appear below the paragraph.
- In-depth customization and optimization of mainstream websites - Optimize mainstream websites such as Google, Twitter, Reddit, YouTube, Bloomberg, and Wall Street Journal to make searching, social networking, and obtaining information smoother and more efficient.
WebChatGPT : Make ChatGPT have Internet access function and integrate various prompts. Note: Please turn off this function when you do not need to search for items online, otherwise all your prompts will be used for searching items online.
ChatGPT File Uploader : After installation, the chatgpt page will have an additionalsubmit filebutton for uploading files. Similar functions include ChatGPT File Uploader Extended .
ChatGPT to Markdown : After installation, a small M icon will appear in the upper right corner of the chatgpt conversation page. After clicking, a window will pop up, which contains the markdown format of the conversation content and is displayed in real time. In other words, the newly added conversation content will be displayed in the window immediately, which is very convenient and very useful for those who often ask questions and write blogs.
MaxAI.me : Use ChatGPT AI on any web page. You can select a piece of text on any web page to operate, and then easily select the required operation (translate, summarize, continue, explain, run...), and supports various mainstream AI models. Similar functions include Monica (AI co-pilot) .
Web2Markdown : Converts the content of the current page to Markdown format. This conversion preserves the text, titles, links, images, and other elements of the web page, allowing you to save and share your website content in a concise and readable way.

Insert image description here

As shown in the picture above, it is fixed in the browser plug-in bar after installation. The above is my blog on csdn. Click the Web2Markdown plug-in icon, and the markdown content window will pop up in a few seconds. Just copy it with one click.

YouTube Summary with ChatGPT & Claude : In the AI era, the fastest way to watch videos is not 2x speed, 3x speed, or 10x speed, but to let AI watch it for you. After installation, a plug-in icon will appear on the YouTube website. After clicking ittranscript&summary, a pop-up window will generate the entire video content in seconds, and you can choose your own language. At the time point of each paragraph, the video will automatically jump to the corresponding position after clicking. Click the chatgpt icon to send the content to chatgpt, and the button on the far right can also copy the entire video text.
Mr.-Ranedeer-AI-Tutor : The author created chatgpt into a professional AI tutor by writing a prompt word of 7800 tokens to guide you in course study. To me, it is the same as the function of a plug-in. Just choose to continue this conversationthrough the author's shared link , then enter /language Simplified Chinese and switch to Chinese mode first. /config sets the tutor type, /plan sets the course, and /start starts learning. The following is the detailed configuration

2. Deep learning

2.1 NLP

"RoPE may be Resnet in the LLM era" : Resnet solves the problem of gradient disappearance after the convolution model becomes deeper, making the depth model shine. RoPE (positional encoding) similarly solves the problem of context inability to associate when the LLM context is too long.

2.2 CV

"Overview | The current status, trends and future directions of visual Transformers in CV"
Kaggle knowledge points: YOLO V5 hyperparameter optimization
Video Pre-Training (VPT) is used for imitation learning (reinforcement learning branch) . After watching 70,000 hours of player videos, AI has learned to perform various tasks in Minecraft, and "Minecraft" has become a testing ground for AI technology.
Kaggle knowledge points: Common semantic segmentation losses

2.3 LLM

"LLM Review"
"FreeWilly: Defeat Llama 2, Compete against GPT-3.5"
"LangChain: Chat with Your Data": How to use LLM to build a private data Q&A system and chat robot based on LangChain
[LLM series command fine-tuning] Long story short, the "Prompt" of large model command fine-tuning
ToolLLM: Facilitate the mastery of large language models with 16,000+ real-world APIs : In order to promote the tool usage capabilities of open source LLMs, the author introduces ToolLLM, a general tool usage framework for data construction, model training and evaluation.
Open source model OpenChat surpasses ChatGPT
Claude2.0: Claude2.0, he is here , Claude2 in-depth experience , "Free, no magic!" Recommended to collect! Local version of Claude-2 five-step rice feeding tutorial》
Build a personal knowledge base based on Quivr

2.4 Neural Network

Kaggle knowledge point: R-Drop regularization : Due to the randomness introduced by dropout, there is a certain inconsistency between training and inference, which may affect the performance and robustness of the model. For each training sample, R-Drop is implemented by minimizing the bidirectional KL divergence between the output distributions of two sub-models sampled by dropout. The purpose of this is to make the output generated by the sub-model under different dropout sampling more consistent in statistical distribution, thereby reducing the difference between training and inference.
"Summary of GPU multi-card parallel training (taking pytorch as an example)"
The most complete guide to adjusting parameters for deep learning! (Attached is the corresponding pdf)
Neural network training trick
Kaggle knowledge points: deep learning code specifications

3. Machine Learning

"Convert Notebook into PPT dynamic web page with one command line"
High Performance Computing Chinese Translation Project
AutoML: Automated Machine Learning. "20 Must-Know Automated Machine Learning Libraries (Python)"
The artifact JupyterLab 4.0 is released!

4. Competition

STI competition task two: [Answer verification baseline plan and idea sharing] , complete code
Kaggle ICR competition question LightGBM basic ideas
Summary of Kaggle competition questions: Stable Diffusion
Text classification fine-tuning practical skills
Kaggle Competition Summary: ICR Disease Prediction (Data Mining) : Predicting whether a person suffers from one of three specific medical conditions based on 56 anonymized health features is a binary classification task.
iFlytek: malignant cell identification baseline (target detection)
iFlytek: Multi-label image retrieval baseline (multi-modal) : Build a model to input multiple text labels and retrieve images containing the label content.
iFlytek: Remote sensing tilt ship detection baseline (target detection)

5. Tools

"The end of the 1.x era, PyTorch 2.0 is coming! 100% backward compatible, one line of code speeds up training by 76%》
"Chapyter": ChatGPT moved into Jupyter, natural language programming can be done in one stop
"jupyter-ai" , "JupyterAI first experience, magic commands and knowledge base for fun"
Sider - a powerful personal AI assistant :
- Sider allows us to use the domestic network to directly experience the excellent large models of ChatGPT, NewBing, Bard, and Painter. It supports GPT-3.5 and GPT-4.0, and the software interface language fully supports Chinese display.
- The software provides nearly 150 trained AI assistants, including finance, personal growth, work, communication, computer science, technology, health, science, job hunting, communication, writing, academics, art, entertainment, language and other aspects. Of course, You can also train a robot assistant that meets your own needs.
The prompt community will soon open a Chinese version: prompt open source learning and exchange community FLOWGPT .
Pandas 2.0 A game changer for data scientists
Deploy your model using Streamlit