230908-MetaGPT technical practice of building exclusive AI Agent-Video Notes

  1. LangChain >>> Concept Overload
  2. MetaGPT: Through AI, complete the bootstrapping of GPT
  3. Original programming should be a function, not done by people
  4. March to May: Currently all open source projects in the industry and 60+ corresponding papers, and 54 corresponding notes written
  5. LangChain >>> 96,000 lines of code, 655 cores, 2,826 functions, total notes of 11,000 words
  6. Many open source projects only seek complete concepts rather than practicality; MetaGPT hopes to achieve specific things.
  7. Agent Protocol: The interaction method between agents.
  8. There is currently no unified Agent Protocol in the market. There may be a version in the second half of this year, and Agent technology is accelerating.
  9. Q: How long will it take for MetaGPT to be scaled up and commercialized? Answer: It requires an industrialization process.
    · MetaGPT: In 1903, the Wright brothers built the first-priced airplane, which could fly for more than ten seconds >>> The flight theory was verified.
    · The development of fighter jets, transport aircraft, etc. requires step-by-step industrialization.
    · When will a wooden airplane turn into a metal airplane and be able to carry people? >>> 18 months >>> 500 lines of code >>> 100,000 lines of code to complete project-level code automation
  10. MetaGPT refers to many frameworks to know which parts are necessary and which parts are not necessary.
  11. Currently MetaGPT is in a preliminary stage.
  12. Role 1: Original programming agent framework; Role 2: Pure multi-agent framework
  13. The agent mall will be launched soon: Agents such as MetaGPT
  14. 80+ of the more than 200 established tasks are already on the Roadmap
  15. There are many uncertainties in practice, such as how to do testing: different language models may have completely different prompt words
  16. When the input is the same, it is difficult to stabilize the output;
  17. The writing of prompt words is similar to SOP, and efficient SOP can be used for different agents.
  18. In the human world, some leading companies may spend tens of billions of dollars to purchase SOPs for their information systems and key projects.
  19. The process of developing SOP is equivalent to writing code
  20. Software companies have three profiles: code + SOP + team. The essence of SOP is to program the team.
  21. Tool chain: training, fine-tuning, inference, deployment, branch reduction, distillation, quantification and all other techniques: the agent itself is very sensitive
  22. Auto-GPT: $460 is a waste >>> Lacks SOPs and dedicated models to solve specific problems
  23. How to solve the problem of developing and interacting with more characters: automated, agent environments
  24. The current development and debugging is very difficult, such as the debugging of prompt words
  25. The big language model acts as the slow thinking of the human brain; fast thinking is human intuition; different sensory organs need corresponding sensory modules; thanks to the senses mentioned above, the big language is not directly modeled;
  26. In practice, multi-modal modeling is not as good as imagined; which model can complete complete intelligence requires a question mark. Language may be the most important thing in the brain, but it's not the only one. In addition to it, there are more than a dozen professional modules.
  27. Apart from emotions, most of the brain is necessary for an intelligent body.
  28. Short-term memory (record everything that happened today) + long-term memory (vector retrieval>>> Human memory is hierarchical, and not all memories are equivalent: vector retrieval, image retrieval, tree retrieval, etc., whether Use both? Which is better?)
  29. Visual large language models require a balance: performance versus cost. Image Token consumption is huge; each 600fps inference.
  30. Google agents cannot exceed 5, and reinforcement learning cannot be simulated if there are more than 5. Solving specific problems in specific environments is actually difficult to converge.
  31. Reinforcement learning: Boil water in a room, and then predict the body temperature, only 2 points out of 100. Reinforcement learning does not understand world languages ​​well.
  32. The original reinforcement learning understands the world from scratch. Today's Agent utilizes world knowledge in RL.
  33. Q: How is AGI’s L4 defined? How to achieve? Answer: 100,000 lines of code >>> API interaction >>> Agent implements 100,000 lines of code on Linux >>> Similar to human level
  34. ChatGPT-4 vs domestic framework: Llama2 can run, but there are problems; domestic LLM can run, but there are problems. It takes a certain amount of time and work to solve related problems. The water level of open source is getting higher and higher, and everyone's level will gradually be aligned with GPT4.
  35. Gpt-3.5 turobo and Gpt-4 may be open source in the future.
  36. Agent needs to become a business intelligence agent rather than a software agent. How the two work together. Organizational form: a group? A bunch of 100 groups? A billion-level group? The transaction methods of different Agents determine its business logic.
  37. HR is an intelligent entity, finance is an intelligent entity, and business is a non-intelligent entity. Business requirements are complex. For example, 300 pages. In the future, it is necessary to break down large requirements or decompose tasks in a structured way. The capabilities of Agents provided by different Agent companies are completely different. In the future, there will be transactions and pricing of intelligent agents, etc. Let the Agent have the mechanical capabilities of domain experts.
  38. The process of communication and understanding is very expensive for humans, let alone intelligent agents. Combine Agent with software engineering.
  39. SOP review: More than two hundred years ago, Adam Smith proposed the division of labor among humans. There are two benefits of division of labor: career path - the process of training and fine-tuning; there is SOP, which talks about the production rhythm on the assembly line (it cannot be less, and it is standard enough). SOP is the highest level planning in the human world. SOP and planning are two different things.
  40. SAM Outman needs to generate some data through synthetic data. The data in the real world has basically been used up, and we can only achieve the current effect. Looking at it now, the upper limit that ChatGpt4 can improve is not particularly high. Most of the improvements are in the strategy, not the model. How to improve: SOP, thinking strategy.
  41. 5-step working method: 1. Make the requirements less stupid; 2. Subtract useless processes or parts; 3. Simplify and optimize; 4. Speed ​​up iteration; 5. Evolve. Some SOPs work particularly well on Agents. The effective SOP fitness is not that high.
  42. Good SOPs and flexibility are actually at odds with each other.
  43. $200 billion in investment in Silicon Valley. MetaAI’s response: open source.
  44. The value of MetaAI: open source + closed source cooperation. Provide all work for AI implementation. Serving domestic Fortune 500 companies.
  45. MetaGPT provides sales and customer service. The overall code volume is small and can be used with partial modifications.
  46. MetaGPT+ large code model >>> Improve the efficiency of code development
  47. First create practical results and be able to implement them, and then recruit people to publish papers together, similar to Google. MetaGPT recruits the community and focuses on writing papers. MetaGPT is internationally influential. Dozens of different media and influencers have reported on the work. Developers have higher recognition. 10 lines of valid code is enough.
  48. Everyone in the community can participate in the work of AGI.

MetaGPT’s technical practice in building exclusive AI Agent—Interviews with guests at the 2023 Global Machine Learning Technology Conference

Guess you like

Origin blog.csdn.net/qq_33039859/article/details/132758283