Microsoft Phi3 Moe Demo

Phi3 Moe Demo

code09.UpdateSamplesmicrosoft-phi-cookbookAug

alph-notebooks/microsoft-phi-cookbook / phi3_moe_demo.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Loading checkpoint shards:   0%|          | 0/17 [00:00<?, ?it/s]

WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.

[8]

PhiMoEForCausalLM(
  (model): PhiMoEModel(
    (embed_tokens): Embedding(32064, 4096)
    (layers): ModuleList(
      (0-31): 32 x PhiMoEDecoderLayer(
        (self_attn): PhiMoESdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=True)
          (k_proj): Linear(in_features=4096, out_features=1024, bias=True)
          (v_proj): Linear(in_features=4096, out_features=1024, bias=True)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=True)
          (rotary_emb): Phi3LongRoPEScaledRotaryEmbedding()
        )
        (block_sparse_moe): PhiMoESparseMoeBlock(
          (gate): Linear(in_features=4096, out_features=16, bias=False)
          (experts): ModuleList(
            (0-15): 16 x PhiMoEBlockSparseTop2MLP(
              (w1): Linear(in_features=4096, out_features=6400, bias=False)
              (w2): Linear(in_features=6400, out_features=4096, bias=False)
              (w3): Linear(in_features=4096, out_features=6400, bias=False)
              (act_fn): SiLU()
            )
          )
        )
        (input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
        (post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
      )
    )
    (norm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=4096, out_features=32064, bias=True)
)

[9]

[10]

2024-08-20 04:51:54.431510: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2024-08-20 04:51:54.431543: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
The model 'PhiMoEForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CohereForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'DbrxForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'FuyuForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'JambaForCausalLM', 'JetMoeForCausalLM', 'LlamaForCausalLM', 'MambaForCausalLM', 'Mamba2ForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'MptForCausalLM', 'MusicgenForCausalLM', 'MusicgenMelodyForCausalLM', 'MvpForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'RecurrentGemmaForCausalLM', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'WhisperForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].

[11]

[12]

[13]

[14]

[15]

[16]

'<|system|> You are a helpful AI assistant, you are an agent capable of using a variety of tools to answer a question. Here are a few of the tools available to you:\n\n- Blog: This tool helps you describe a certain knowledge point and content, and finally write it into Twitter or Facebook style content\n- Translate: This is a tool that helps you translate into any language, using plain language as required\n\nTo use these tools you must always respond in JSON format containing `"tool_name"` and `"input"` key-value pairs. For example, to answer the question, "Build Muliti Agents with MOE models" you must use the calculator tool like so:\n\n```json\n\n{\n    "tool_name": "Blog",\n    "input": "Build Muliti Agents with MOE models"\n}\n\n```\n\nOr to translate the question "can you introduce yourself in Chinese" you must respond:\n\n```json\n\n{\n    "tool_name": "Search",\n    "input": "can you introduce yourself in Chinese"\n}\n\n```\n\nRemember just output the final result, ouput in JSON format containing `"agentid"`,`"tool_name"` , `"input"` and `"output"`  key-value pairs .:\n\n```json\n\n[\n\n\n{   "agentid": "step1",\n    "tool_name": "Blog",\n    "input": "Build Muliti Agents with MOE models",\n    "output": "........."\n},\n\n{   "agentid": "step2",\n    "tool_name": "Search",\n    "input": "can you introduce yourself in Chinese",\n    "output": "........."\n},\n{\n    "agentid": "final"\n    "tool_name": "Result",\n    "output": "........."\n}\n]\n\n```\n\nThe users answer is as follows.\n <|end|>\n<|user|> Write something about Generative AI with MOE , translate it to Chinese <|end|>\n<|assistant|>'

[17]

[18]

/anaconda/envs/azureml_py38/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:567: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.3` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
  warnings.warn(
The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.

[19]

' ```json\n\n[\n\n{   "agentid": "step1",\n    "tool_name": "Blog",\n    "input": "Generative AI with MOE",\n    "output": "Generative AI with MOE (Mixture of Experts) is a powerful approach that combines the strengths of generative models and the flexibility of MOE architecture. This hybrid model can generate high-quality, diverse, and contextually relevant content, making it suitable for various applications such as content creation, data augmentation, and more."\n},\n\n{   "agentid": "step2",\n    "tool_name": "Translate",\n    "input": "Generative AI with MOE is a powerful approach that combines the strengths of generative models and the flexibility of MOE architecture. This hybrid model can generate high-quality, diverse, and contextually relevant content, making it suitable for various applications such as content creation, data augmentation, and more.",\n    "output": "基于生成AI的MOE（Mixture of Experts）是一种强大的方法，它结合了生成模型的优势和MOE架构的灵活性。这种混合模型可以生成高质量、多样化且上下文相关的内容，使其适用于各种应用，如内容创建、数据增强等。"\n},\n{\n    "agentid": "final",\n    "tool_name": "Result",\n    "output": "基于生成AI的MOE（Mixture of Experts）是一种强大的方法，它结合了生成模型的优势和MOE架构的灵活性。这种混合模型可以生成高质量、多样化且上下文相关的内容，使其适用于各种应用，如内容创建、数据增强等。"\n}\n]\n```'