New Westminster, BC, Canada
+1(902) 601-1902

Text Embedding Models – Performance Comparison

Text Embedding Models – Performance Comparison

Compare Ada, Gecko, GTE-Large

One of the key challenges with text embeddings is their inability to always bring back exact match results, primarily due to the nature of how embeddings represent and understand language.

Text embeddings focus on capturing semantic meaning rather than exact word-to-word matches. They represent words or phrases in a continuous vector space based on their context and meaning. As a result, embeddings might prioritize semantically similar but not identical words or phrases over exact matches.

In this article, we challenge the model’s ability to prioritize documents containing exact matches over documents with strong semantic meaning.

Embedding Models

Today on the table we have three leading embedding models out there:

Model =Vendor =# of dim’sPro’sCon’s
Ada-002OpenAI1536– proprietary
– calls external server
– paid
– most number of dimensions
Gecko-003Google768– least number of dimensions– proprietary
– calls external server
– paid
GTE-LargeAlibaba1024– runs locally (on-prem)
– Free!!!
– requires system resources

Test Scenario #1: exact match

In a database containing 1,000 rows, only three (3) rows contain a marker phrase and 100+ rows contain words of phrases semantically similar to the marker phrase. We ask the PGVector DB to bring back top 10 documents using a cosine similarity search.

The scoring is based on how close the rows containing the exact matches make it to the top: #1 scoring 10 points, #2 scoring 9 points, etc.

The top 10 results contained rows with “marker phrase” as follows:

ModelAda-002Gecko-003GTE-Large
Result position #PosScorePosScorePosScore
1x10x10
2
3
4x7
5x6
6x5
7x4
8x3
9
10
Total Score:32121

As you can see, both Google and Alibaba models did a great job and all three rows made the first 10 results. OpenAI, however, found other documents more similar than the ones containing the exact match, displaying only one out of three within the first 10.

Test Scenario #2: diluted exact match

Now we repeat Scenario #1, diluting the marker phrase with garbage words, so our query “marker phrase” becomes “marker phrase garbage words” (database rows remain intact, we dilute the query only). Here are the results:

ModelAda-002Gecko-003GTE-Large
Result position #PosScorePosScorePosScore
1
2
3x8x8
4
5
6
7x4
8
9
10x1
Total Score:0138

Now, this is interesting! Google, even though losing positions and scoring 8 points less than before, still managed to maintain all three rows within the first ten results. GTE comes in second, losing two rows to other results, while OpenAI didn’t deliver at all!

Conclusion

Even though this experiment is not exactly fair, our goal was to see if different models give different weight to the occurrence of exact matches within texts. In this particular scenario, and with the data we had, our leaderboard looks as follows:

#1: Google’s Gecko-003 with 34 points,

#2: GTE-Large with 29 points, and

#3: OpenAI’s Ada-002 scoring only 3 points.

In this case, Google’s Gecko-003 is the clear winner!

(PS: your experience may differ between datasets)

GET IN TOUCH

    X
    CONTACT US