From e9d4322fbfeaf894e351deee781f354031c0f38a Mon Sep 17 00:00:00 2001 From: Konstantin Date: Tue, 17 Jan 2023 19:19:05 +0100 Subject: Added GeForce 1650 results to performance measures --- SampleClips/Readme.txt | 18 ++++++++++----- SampleClips/columbia-large-1650.txt | 43 ++++++++++++++++++++++++++++++++++++ SampleClips/columbia-medium-1650.txt | 43 ++++++++++++++++++++++++++++++++++++ SampleClips/jfk-large-1650.txt | 43 ++++++++++++++++++++++++++++++++++++ SampleClips/jfk-medium-1650.txt | 43 ++++++++++++++++++++++++++++++++++++ 5 files changed, 184 insertions(+), 6 deletions(-) create mode 100644 SampleClips/columbia-large-1650.txt create mode 100644 SampleClips/columbia-medium-1650.txt create mode 100644 SampleClips/jfk-large-1650.txt create mode 100644 SampleClips/jfk-medium-1650.txt (limited to 'SampleClips') diff --git a/SampleClips/Readme.txt b/SampleClips/Readme.txt index 69432f4..0945728 100644 --- a/SampleClips/Readme.txt +++ b/SampleClips/Readme.txt @@ -3,15 +3,21 @@ jfk.wav is from whisper.cpp repository. columbia.wma is from Wikipedia: https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg -I had to re-encoded the audio from Ogg Vorbis into Windows Media Audio, because Media Foundation is unable to decode Vorbis. +I re-encoded the audio from Ogg Vorbis into Windows Media Audio, because Media Foundation is unable to decode Vorbis. -The rest of the text files in this folder are the outputs of the in-app performance profiler, when the app was transcribing these two audio clips on two different computers. +The rest of the text files in this folder are the outputs of the in-app performance profiler, when the app was transcribing these two audio clips on three different computers. -The files names containing `1080ti` are from my desktop, which has nVidia GeForce 1080Ti GPU. +The “1080ti” files are from my desktop, which has nVidia GeForce 1080Ti GPU. -The files names containing `vega7` are from my laptop, the GPU is integrated into AMD Ryzen 5 5600U processor. +The “vega7” files are from my laptop, the GPU is integrated into AMD Ryzen 5 5600U processor. The laptop model is HP ProBook 445 G8. While running the tests, the laptop was on battery power. -The files names with `medium` in the middle were made with `ggml-medium.bin` Whisper model. +The “1650” files are from another desktop with nVidia GeForce 1650. -The files with `large` were made with `ggml-large.bin` model. \ No newline at end of file +The file names with “medium” in the middle were made with “ggml-medium.bin” Whisper model, with “large” were made with “ggml-large.bin” model. + +In theory, 1080ti delivers 10.6 TFlops FP32 and 484.4 GB/second VRAM bandwidth. + +That variant of 1650 delivers 2.6 TFlops FP32, and 128.1 GB/second VRAM bandwidth. + +The AMD APU in that laptop delivers 1.6 TFlops FP32, and 51.2 GB/second memory bandwidth. \ No newline at end of file diff --git a/SampleClips/columbia-large-1650.txt b/SampleClips/columbia-large-1650.txt new file mode 100644 index 0000000..ffc9e5c --- /dev/null +++ b/SampleClips/columbia-large-1650.txt @@ -0,0 +1,43 @@ + CPU Tasks +LoadModel 1.39046 seconds +RunComplete 98.7705 seconds +Run 98.6893 seconds +Callbacks 10.9446 milliseconds, 44 calls, 248.741 microseconds average +Spectrogram 1.10864 seconds, 41 calls, 27.04 milliseconds average +Sample 62.5537 milliseconds, 527 calls, 118.698 microseconds average +Encode 60.6321 seconds, 9 calls, 6.7369 seconds average +Decode 38.0118 seconds, 9 calls, 4.22353 seconds average +DecodeStep 37.949 seconds, 527 calls, 72.0095 milliseconds average + GPU Tasks +LoadModel 1.19991 seconds +Run 98.4248 seconds +Encode 61.0298 seconds, 9 calls, 6.78109 seconds average +EncodeLayer 51.7844 seconds, 288 calls, 179.807 milliseconds average +Decode 37.395 seconds, 9 calls, 4.155 seconds average +DecodeStep 37.3947 seconds, 527 calls, 70.9577 milliseconds average +DecodeLayer 34.8821 seconds, 16864 calls, 2.06843 milliseconds average + Compute Shaders +mulMatTiled 65.2919 seconds, 6345 calls, 10.2903 milliseconds average +mulMatByRowTiled 22.3701 seconds, 199430 calls, 112.17 microseconds average +convolutionMain2Fixed 1.37801 seconds, 9 calls, 153.113 milliseconds average +softMaxFixed 1.32519 seconds, 17152 calls, 77.2618 microseconds average +addRepeat 1.0237 seconds, 68896 calls, 14.8586 microseconds average +copyTranspose 974.149 milliseconds, 34304 calls, 28.3975 microseconds average +norm 971.572 milliseconds, 51704 calls, 18.791 microseconds average +softMax 956.611 milliseconds, 17391 calls, 55.0061 microseconds average +copyConvert 899.362 milliseconds, 34880 calls, 25.7845 microseconds average +fmaRepeat1 675.729 milliseconds, 51704 calls, 13.0692 microseconds average +addRepeatGelu 531.623 milliseconds, 17170 calls, 30.9623 microseconds average +addInPlace 461.61 milliseconds, 34304 calls, 13.4564 microseconds average +scaleInPlace 394.457 milliseconds, 17152 calls, 22.9978 microseconds average +convolutionMain 331.124 milliseconds, 9 calls, 36.7915 milliseconds average +addRepeatScale 329.854 milliseconds, 33728 calls, 9.77983 microseconds average +add 203.376 milliseconds, 16873 calls, 12.0534 microseconds average +diagMaskInf 107.127 milliseconds, 16864 calls, 6.3524 microseconds average +convolutionPrep1 58.8876 milliseconds, 18 calls, 3.27153 milliseconds average +convolutionPrep2 9.1367 milliseconds, 18 calls, 507.594 microseconds average +addRows 3.6551 milliseconds, 527 calls, 6.93567 microseconds average + Memory Usage +Model 892.591 KB RAM, 2.8815 GB VRAM +Context 92.2616 MB RAM, 1.20719 GB VRAM +Total 93.1333 MB RAM, 4.08869 GB VRAM diff --git a/SampleClips/columbia-medium-1650.txt b/SampleClips/columbia-medium-1650.txt new file mode 100644 index 0000000..10d6984 --- /dev/null +++ b/SampleClips/columbia-medium-1650.txt @@ -0,0 +1,43 @@ + CPU Tasks +LoadModel 818.374 milliseconds +RunComplete 55.336 seconds +Run 55.238 seconds +Callbacks 8.3113 milliseconds, 37 calls, 224.63 microseconds average +Spectrogram 1.11163 seconds, 42 calls, 26.4674 milliseconds average +Sample 59.2017 milliseconds, 511 calls, 115.855 microseconds average +Encode 33.7839 seconds, 10 calls, 3.37839 seconds average +Decode 21.4456 seconds, 10 calls, 2.14456 seconds average +DecodeStep 21.3862 seconds, 511 calls, 41.8517 milliseconds average + GPU Tasks +LoadModel 626.222 milliseconds +Run 55.0407 seconds +Encode 34.044 seconds, 10 calls, 3.4044 seconds average +EncodeLayer 28.8064 seconds, 240 calls, 120.027 milliseconds average +Decode 20.9967 seconds, 10 calls, 2.09967 seconds average +DecodeStep 20.9967 seconds, 511 calls, 41.0894 milliseconds average +DecodeLayer 19.0732 seconds, 12264 calls, 1.55522 milliseconds average + Compute Shaders +mulMatTiled 36.347 seconds, 5290 calls, 6.87089 milliseconds average +mulMatByRowTiled 12.1268 seconds, 144789 calls, 83.7549 microseconds average +convolutionMain2Fixed 956.94 milliseconds, 10 calls, 95.694 milliseconds average +softMaxFixed 878.266 milliseconds, 12504 calls, 70.2388 microseconds average +softMax 708.091 milliseconds, 12775 calls, 55.4279 microseconds average +addRepeat 648.271 milliseconds, 50256 calls, 12.8994 microseconds average +copyConvert 532.099 milliseconds, 25488 calls, 20.8764 microseconds average +copyTranspose 467.681 milliseconds, 25008 calls, 18.7013 microseconds average +normFixed 393.9 milliseconds, 37793 calls, 10.4226 microseconds average +addRepeatGelu 354.445 milliseconds, 12524 calls, 28.3013 microseconds average +fmaRepeat1 348.257 milliseconds, 37793 calls, 9.21484 microseconds average +addInPlace 308.862 milliseconds, 25008 calls, 12.3505 microseconds average +convolutionMain 278.894 milliseconds, 10 calls, 27.8894 milliseconds average +addRepeatScale 199.387 milliseconds, 24528 calls, 8.12898 microseconds average +scaleInPlace 197.51 milliseconds, 12504 calls, 15.7958 microseconds average +add 134.664 milliseconds, 12274 calls, 10.9715 microseconds average +diagMaskInf 57.9927 milliseconds, 12264 calls, 4.72869 microseconds average +convolutionPrep1 41.0155 milliseconds, 20 calls, 2.05077 milliseconds average +convolutionPrep2 8.0689 milliseconds, 20 calls, 403.445 microseconds average +addRows 3.1188 milliseconds, 511 calls, 6.10333 microseconds average + Memory Usage +Model 877.966 KB RAM, 1.42785 GB VRAM +Context 91.0719 MB RAM, 841.634 MB VRAM +Total 91.9293 MB RAM, 2.24976 GB VRAM diff --git a/SampleClips/jfk-large-1650.txt b/SampleClips/jfk-large-1650.txt new file mode 100644 index 0000000..9c6e4b8 --- /dev/null +++ b/SampleClips/jfk-large-1650.txt @@ -0,0 +1,43 @@ + CPU Tasks +LoadModel 1.4018 seconds +RunComplete 8.71063 seconds +Run 8.64303 seconds +Callbacks 251.9 microseconds, 4 calls, 62.975 microseconds average +Spectrogram 62.1203 milliseconds, 3 calls, 20.7068 milliseconds average +Sample 3.5493 milliseconds, 27 calls, 131.456 microseconds average +Encode 6.90879 seconds +Decode 1.73396 seconds +DecodeStep 1.73039 seconds, 27 calls, 64.0887 milliseconds average + GPU Tasks +LoadModel 1.20907 seconds +Run 8.4523 seconds +Encode 6.83046 seconds +EncodeLayer 5.71692 seconds, 32 calls, 178.654 milliseconds average +Decode 1.62184 seconds +DecodeStep 1.62184 seconds, 27 calls, 60.068 milliseconds average +DecodeLayer 1.51049 seconds, 864 calls, 1.74825 milliseconds average + Compute Shaders +mulMatTiled 6.39268 seconds, 705 calls, 9.06763 milliseconds average +mulMatByRowTiled 1.09505 seconds, 10010 calls, 109.395 microseconds average +convolutionMain2Fixed 155.164 milliseconds +convolutionMain 123.525 milliseconds +softMaxFixed 120.173 milliseconds, 896 calls, 134.122 microseconds average +norm 84.1752 milliseconds, 2684 calls, 31.3618 microseconds average +copyConvert 78.0956 milliseconds, 1856 calls, 42.0774 microseconds average +addRepeat 63.3192 milliseconds, 3616 calls, 17.5108 microseconds average +fmaRepeat1 56.6908 milliseconds, 2684 calls, 21.1218 microseconds average +softMax 54.6717 milliseconds, 891 calls, 61.3599 microseconds average +addInPlace 39.7892 milliseconds, 1792 calls, 22.2038 microseconds average +copyTranspose 38.8897 milliseconds, 1792 calls, 21.7018 microseconds average +addRepeatGelu 34.762 milliseconds, 898 calls, 38.7105 microseconds average +add 33.3001 milliseconds, 865 calls, 38.4972 microseconds average +scaleInPlace 24.343 milliseconds, 896 calls, 27.1685 microseconds average +addRepeatScale 18.8872 milliseconds, 1728 calls, 10.9301 microseconds average +convolutionPrep1 7.8052 milliseconds, 2 calls, 3.9026 milliseconds average +diagMaskInf 4.1647 milliseconds, 864 calls, 4.82025 microseconds average +convolutionPrep2 1.209 milliseconds, 2 calls, 604.5 microseconds average +addRows 183.6 microseconds, 27 calls, 6.8 microseconds average + Memory Usage +Model 892.591 KB RAM, 2.8815 GB VRAM +Context 1.98413 MB RAM, 1.07361 GB VRAM +Total 2.8558 MB RAM, 3.95511 GB VRAM diff --git a/SampleClips/jfk-medium-1650.txt b/SampleClips/jfk-medium-1650.txt new file mode 100644 index 0000000..b072607 --- /dev/null +++ b/SampleClips/jfk-medium-1650.txt @@ -0,0 +1,43 @@ + CPU Tasks +LoadModel 818.309 milliseconds +RunComplete 4.59853 seconds +Run 4.51124 seconds +Callbacks 259.1 microseconds, 4 calls, 64.775 microseconds average +Spectrogram 62.0087 milliseconds, 3 calls, 20.6696 milliseconds average +Sample 3.3139 milliseconds, 28 calls, 118.354 microseconds average +Encode 3.54162 seconds +Decode 969.342 milliseconds +DecodeStep 966.005 milliseconds, 28 calls, 34.5002 milliseconds average + GPU Tasks +LoadModel 623.002 milliseconds +Run 4.38954 seconds +Encode 3.46286 seconds +EncodeLayer 2.86548 seconds, 24 calls, 119.395 milliseconds average +Decode 926.677 milliseconds +DecodeStep 926.674 milliseconds, 28 calls, 33.0955 milliseconds average +DecodeLayer 843.963 milliseconds, 672 calls, 1.2559 milliseconds average + Compute Shaders +mulMatTiled 3.19154 seconds, 529 calls, 6.03316 milliseconds average +mulMatByRowTiled 628.359 milliseconds, 7803 calls, 80.5278 microseconds average +convolutionMain2Fixed 98.3757 milliseconds +convolutionMain 95.2955 milliseconds +softMaxFixed 73.4031 milliseconds, 696 calls, 105.464 microseconds average +addRepeat 58.0541 milliseconds, 2808 calls, 20.6745 microseconds average +copyConvert 42.8539 milliseconds, 1440 calls, 29.7597 microseconds average +softMax 37.7754 milliseconds, 700 calls, 53.9649 microseconds average +normFixed 25.4389 milliseconds, 2093 calls, 12.1543 microseconds average +fmaRepeat1 24.6287 milliseconds, 2093 calls, 11.7672 microseconds average +addRepeatGelu 24.2553 milliseconds, 698 calls, 34.7497 microseconds average +copyTranspose 24.2415 milliseconds, 1392 calls, 17.4149 microseconds average +addInPlace 20.4598 milliseconds, 1392 calls, 14.6981 microseconds average +scaleInPlace 12.8947 milliseconds, 696 calls, 18.5269 microseconds average +addRepeatScale 10.8749 milliseconds, 1344 calls, 8.09144 microseconds average +add 7.3752 milliseconds, 673 calls, 10.9587 microseconds average +convolutionPrep1 6.0929 milliseconds, 2 calls, 3.04645 milliseconds average +diagMaskInf 3.2818 milliseconds, 672 calls, 4.88363 microseconds average +convolutionPrep2 1.2268 milliseconds, 2 calls, 613.4 microseconds average +addRows 165.9 microseconds, 28 calls, 5.925 microseconds average + Memory Usage +Model 877.966 KB RAM, 1.42785 GB VRAM +Context 1.98347 MB RAM, 723.729 MB VRAM +Total 2.84085 MB RAM, 2.13462 GB VRAM -- cgit v1.2.3