Rasa course, Rasa training, Rasa interview, Hugging Face bert-base-chinese use of Rasa practical series

Rasa course, Rasa training, Rasa interview, Hugging Face bert-base-chinese use of Rasa practical series

Hugging Face

Build the AI ​​community of the future to build, train, and deploy state-of-the-art models powered by reference open source in machine learning.

https://huggingface.co/
insert image description here

Model downloadbert-base-chinese

Download address https://huggingface.co/models?sort=downloads&search=bert-base-chineseDownload
insert image description here
model files
insert image description here

If it cannot be downloaded due to network reasons, readers can also download Tensorflow's bert-base-cased model on the network disk.
Link: https://pan.baidu.com/s/1KyAUdEdEXi3v1-HyNajNQg
Extraction code: 4xd0

The test code of the Tensorflow model, the model is automatically downloaded and saved to 'C:\Users\admin/.cache\huggingface\transformers'.

# -*- coding: utf-8 -*-
from transformers import AutoTokenizer, TFAutoModel
model_name="bert-base-chinese"
text = "this is a test"
tokenizer = AutoTokenizer.from_pretrained(model_name)
text_tensor = tokenizer.encode(text, return_tensors="tf")
print(text_tensor)
model = TFAutoModel.from_pretrained(model_name)
output = model(text_tensor) #
print(output)

In the E:\anaconda3\envs\installingrasa\Lib\site-packages\transformers\file_utils.py code,
you can also modify the default cache address of Tensorflow to the specified directory: (The directory saved in this article is D:\2022_NOC_AI_RASA_NEW\bert- base-cased)
insert image description here
results:
insert image description here
insert image description here

The file_utils.py code of the tensorflow framework

def cached_path(
    url_or_filename,
    cache_dir=None,
    force_download=False,
    proxies=None,
    resume_download=False,
    user_agent: Union[Dict, str, None] = None,
    extract_compressed_file=False,
    force_extract=False,
    use_auth_token: Union[bool, str, None] = None,
    local_files_only=False,
) -> Optional[str]:
    """
    Given something that might be a URL (or might be a local path), determine which. If it's a URL, download the file
    and cache it, and return the path to the cached file. If it's already a local path, make sure the file exists and
    then return the path

    Args:
        cache_dir: specify a cache directory to save the file to (overwrite the default cache dir).
        force_download: if True, re-download the file even if it's already cached in the cache dir.
        resume_download: if True, resume the download if incompletely received file is found.
        user_agent: Optional string or dict that will be appended to the user-agent on remote requests.
        use_auth_token: Optional string or boolean to use as Bearer token for remote files. If True,
            will get token from ~/.huggingface.
        extract_compressed_file: if True and the path point to a zip or tar file, extract the compressed
            file in a folder along the archive.
        force_extract: if True when extract_compressed_file is True and the archive was already extracted,
            re-extract the archive and override the folder where it was extracted.

    Return:
        Local path (string) of file or if networking is off, last version of file cached on disk.

    Raises:
        In case of non-recoverable file (non-existent or inaccessible url + no cache on disk).
    """
    if cache_dir is None: 
        # cache_dir = TRANSFORMERS_CACHE 
        cache_dir = "D:\\2022_NOC_AI_RASA_NEW\\bert-base-cased"
    if isinstance(url_or_filename, Path):
        url_or_filename = str(url_or_filename)
    if isinstance(cache_dir, Path):
        cache_dir = str(cache_dir)

    if is_offline_mode() and not local_files_only:
        logger.info("Offline mode: forcing local_files_only=True")
        local_files_only = True

    if is_remote_url(url_or_filename):
        # URL, so get it from the cache (downloading if necessary)
        output_path = get_from_cache(
            url_or_filename,
            cache_dir=cache_dir,
            force_download=force_download,
            proxies=proxies,
            resume_download=resume_download,
            user_agent=user_agent,
            use_auth_token=use_auth_token,
            local_files_only=local_files_only,
        )
    elif os.path.exists(url_or_filename):
        # File, and it exists.
        output_path = url_or_filename
    elif urlparse(url_or_filename).scheme == "":
        # File, but it doesn't exist.
        raise EnvironmentError(f"file {
      
      url_or_filename} not found")
    else:
        # Something unknown
        raise ValueError(f"unable to parse {
      
      url_or_filename} as a URL or as a local path")

    if extract_compressed_file:
        if not is_zipfile(output_path) and not tarfile.is_tarfile(output_path):
            return output_path

        # Path where we extract compressed archives
        # We avoid '.' in dir name and add "-extracted" at the end: "./model.zip" => "./model-zip-extracted/"
        output_dir, output_file = os.path.split(output_path)
        output_extract_dir_name = output_file.replace(".", "-") + "-extracted"
        output_path_extracted = os.path.join(output_dir, output_extract_dir_name)

        if os.path.isdir(output_path_extracted) and os.listdir(output_path_extracted) and not force_extract:
            return output_path_extracted

        # Prevent parallel extractions
        lock_path = output_path + ".lock"
        with FileLock(lock_path):
            shutil.rmtree(output_path_extracted, ignore_errors=True)
            os.makedirs(output_path_extracted)
            if is_zipfile(output_path):
                with ZipFile(output_path, "r") as zip_file:
                    zip_file.extractall(output_path_extracted)
                    zip_file.close()
            elif tarfile.is_tarfile(output_path):
                tar_file = tarfile.open(output_path)
                tar_file.extractall(output_path_extracted)
                tar_file.close()
            else:
                raise EnvironmentError(f"Archive format of {
      
      output_path} could not be identified")

        return output_path_extracted

    return output_path

Transformer Flavor

recipe: default.v1
language: zh
pipeline:
#  - name: JiebaTokenizer
  - name: components.custom_jieba_tokenizer.Custom_JiebaTokenizer
  - name: LanguageModelFeaturizer
    model_name: "bert"
    model_weights: "bert-base-chinese"
  - name: RegexFeaturizer
  - name: RegexEntityExtractor
  - name: DIETClassifier
    epochs: 100
    tensorboard_log_directory: ./log
    learning_rate: 0.001
  - name: ResponseSelector
    epochs: 1000
    learning_rate: 0.001
  - name: FallbackClassifier
    threshold: 0.4
    ambiguity_threshold: 0.1
  - name: EntitySynonymMapper

Test case running results:
insert image description here

insert image description here

Rasa Community Advice

insert image description here

Rasa in action

Rasa book recommendation: Rasa in Action: Building an Open Source Conversational Bot introduces the workflow of Rasa's two core components, Rasa NLU and Rasa Core; then details how to build, configure, train, and serve different types of conversations from scratch by using the Rasa ecosystem The overall process of the robot
insert image description here

Rasa 3.x series blog sharing

Guess you like

Origin blog.csdn.net/duan_zhihua/article/details/123660072