Update README.md
Browse files
README.md
CHANGED
|
@@ -166,18 +166,14 @@ print(make_table(results))
|
|
| 166 |
We can run the quantized model on a mobile phone using [ExecuTorch](https://github.com/pytorch/executorch).
|
| 167 |
Once ExecuTorch is [set-up](https://pytorch.org/executorch/main/getting-started.html), exporting and running the model on device is a breeze.
|
| 168 |
|
| 169 |
-
|
| 170 |
-
## Convert quantized checkpoint to ExecuTorch's format
|
| 171 |
-
|
| 172 |
We first convert the quantized checkpoint to one ExecuTorch's LLM export script expects by renaming some of the checkpoint keys.
|
| 173 |
The following script does this for you.
|
| 174 |
```
|
| 175 |
python -m executorch.examples.models.phi_4_mini.convert_weights phi4-mini-8dq4w.bin phi4-mini-8dq4w-converted.bin
|
| 176 |
```
|
| 177 |
|
| 178 |
-
Once the checkpoint is converted, we can export to ExecuTorch's PTE format.
|
| 179 |
|
| 180 |
-
## Export to an ExecuTorch *.pte with XNNPACK
|
| 181 |
```
|
| 182 |
PARAMS="executorch/examples/models/phi_4_mini/config.json"
|
| 183 |
python -m executorch.examples.models.llama.export_llama \
|
|
@@ -192,7 +188,7 @@ python -m executorch.examples.models.llama.export_llama \
|
|
| 192 |
```
|
| 193 |
|
| 194 |
## Running in a mobile app
|
| 195 |
-
The PTE file can be run with ExecuTorch. See the [instructions](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) for doing this in iOS.
|
| 196 |
On iPhone 15 Pro, the model runs at 17.3 tokens/sec and uses 3206 Mb of memory.
|
| 197 |
|
| 198 |

|
|
|
|
| 166 |
We can run the quantized model on a mobile phone using [ExecuTorch](https://github.com/pytorch/executorch).
|
| 167 |
Once ExecuTorch is [set-up](https://pytorch.org/executorch/main/getting-started.html), exporting and running the model on device is a breeze.
|
| 168 |
|
|
|
|
|
|
|
|
|
|
| 169 |
We first convert the quantized checkpoint to one ExecuTorch's LLM export script expects by renaming some of the checkpoint keys.
|
| 170 |
The following script does this for you.
|
| 171 |
```
|
| 172 |
python -m executorch.examples.models.phi_4_mini.convert_weights phi4-mini-8dq4w.bin phi4-mini-8dq4w-converted.bin
|
| 173 |
```
|
| 174 |
|
| 175 |
+
Once the checkpoint is converted, we can export to ExecuTorch's PTE format with the XNNPACK delegate.
|
| 176 |
|
|
|
|
| 177 |
```
|
| 178 |
PARAMS="executorch/examples/models/phi_4_mini/config.json"
|
| 179 |
python -m executorch.examples.models.llama.export_llama \
|
|
|
|
| 188 |
```
|
| 189 |
|
| 190 |
## Running in a mobile app
|
| 191 |
+
The PTE file can be run with ExecuTorch on a mobile phone. See the [instructions](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) for doing this in iOS.
|
| 192 |
On iPhone 15 Pro, the model runs at 17.3 tokens/sec and uses 3206 Mb of memory.
|
| 193 |
|
| 194 |

|