[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable attention mask during new token generation #1688

Closed
wants to merge 3 commits into from

Conversation

Andrei-Aksionov
Copy link
Collaborator
@Andrei-Aksionov Andrei-Aksionov commented Aug 21, 2024

Hey there 👋

This is a draft (a very ugly one) to check my assumption that you don't need an attention mask during generation for a batch size of 1 and kv-cache enabled. Attention mask is needed during a prefill stage, but not during a token generation.

The thing is, with SDPA if a mask is provided the flash attention is disabled, which makes it slower.


Important

Work is still in progress.
The output is not almost identical, but speedup is not significant.

With prompt:

litgpt generate microsoft/phi-2 --prompt "Tell me a very long story about llamas" --max_new_tokens=500

Current main:

Output: In the land of llamas, a wise old llama named Llamasaurus lived in a cozy little burrow. He was known for his vast knowledge and his ability to solve any problem. One day, a group of adventurous children stumbled upon his burrow and asked if he could teach them something amazing. Llamasaurus thought for a moment and decided to teach them about the power of llama magic. He showed them a small bag of llama dust and instructed them to sprinkle it in the air. Suddenly, the room filled with a gentle breeze, and the furniture started moving on its own. The children were amazed and couldn't believe their eyes. Llamasaurus explained that llama magic was a special gift that llamas possessed. He encouraged the children to use their newfound powers responsibly and to always remember the importance of kindness. From that day on, the children became known as the Llama Magicians, and they used their powers to bring joy and happiness to everyone they met. Llamasaurus was proud of their accomplishments and knew that he had passed down an important lesson to the next generation.

Time for inference 1: 4.86 sec total, 46.93 tokens/sec
Memory used: 5.76 GB

This PR:

Output: In the land of llamas, a wise old llama named Llamasaurus lived in a cozy little burrow. He was known for his vast knowledge and his ability to predict the weather. One day, as he was taking a leisurely stroll through the meadow, he noticed a strange phenomenon in the sky. The clouds were swirling in a pattern that he had never seen before. Intrigued, Llamasaurus decided to investigate. He climbed to the top of the highest hill and watched as the clouds transformed into different shapes and colors. It was like nothing he had ever seen before. As he watched, the clouds began to form the shape of a llama. Llamasaurus couldn't help but laugh at the sight. It was as if the sky was playing a game with him. He knew he had to share this incredible experience with others. So, he gathered all the llamas in the land and told them about the magical llama in the sky. They were amazed and couldn't wait to witness it for themselves. The next day, the llamas gathered at the hilltop and waited patiently. And just as Llamasaurus had predicted, the clouds transformed into a magnificent llama. It was a sight to behold. The llamas celebrated and danced in joy, grateful to Llamasaurus for sharing such a wonderful experience. From that day on, whenever the llamas looked up at the sky, they would remember the magical llama and the wise old llama who had brought them together.

Time for inference 1: 6.49 sec total, 48.07 tokens/sec
Memory used: 5.76 GB

@rasbt rasbt deleted the disable_attn_mask branch September 24, 2024 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant