Questions tagged [sentence-similarity]

Sentence similarity is a topic of Natural Language Processing that tries to find a semantic or syntactic matematical similarity between two or more sentences

sentence-similarity
Filter by
Sorted by
Tagged with
28 votes
3 answers
6k views

How to build semantic search for a given domain

There is a problem we are trying to solve where we want to do a semantic search on our set of data, i.e we have a domain-specific data (example: sentences talking about automobiles) Our data is just ...
Jickson's user avatar
  • 5,193
26 votes
2 answers
38k views

is there a way to check similarity between two full sentences in python?

I am making a project like this one here: https://www.youtube.com/watch?v=dovB8uSUUXE&feature=youtu.be but i am facing trouble because i need to check the similarity between the sentences for ...
Bemwa Malak's user avatar
  • 1,297
12 votes
2 answers
7k views

Sentence similarity using keras

I'm trying to implement sentence similarity architecture based on this work using the STS dataset. Labels are normalized similarity scores from 0 to 1 so it is assumed to be a regression model. My ...
lila's user avatar
  • 121
10 votes
1 answer
7k views

word2vec, sum or average word embeddings?

I'm using word2vec to represent a small phrase (3 to 4 words) as a unique vector, either by adding each individual word embedding or by calculating the average of word embeddings. From the experiments ...
David Batista's user avatar
9 votes
2 answers
4k views

Siamese Network with LSTM for sentence similarity in Keras gives periodically the same result

I'm a newbie in Keras and I'm trying to solve the task of sentence similairty using NN in Keras. I use word2vec as word embedding, and then a Siamese Network to prediction how similar two sentences ...
MiVe93's user avatar
  • 93
8 votes
4 answers
2k views

Sentence similarity models not capturing opposite sentences

I have tried different approaches to sentence similarity, namely: spaCy models: en_core_web_md and en_core_web_lg. Transformers: using the packages sentence-similarity and sentence-transformers, I'...
Diego Miguel's user avatar
7 votes
5 answers
4k views

What is the best way to get accurate text similarity in python for comparing single words or bigrams?

I've got similar product data in both the products_a array and products_b array: products_a = [{color: "White", size: "2' 3\""}, {color: "Blue", size: "5' 8\&...
rom's user avatar
  • 596
7 votes
2 answers
4k views

How to determine if two sentences talk about similar topics?

I would like to ask you a question. Is there any algorithm/tool which can allow me to do some association between words? For example: I have the following group of sentences: (1) "My phone is ...
user avatar
4 votes
3 answers
12k views

Finding most similar sentences among all in python

Suggestions / refer links /codes are appreciated. I have a data which is having more than 1500 rows. Each row has a sentence. I am trying to find out the best method to find the most similar sentences ...
vivek's user avatar
  • 61
4 votes
1 answer
3k views

sentence transformer how to predict new example

I am exploring sentence transformers and came across this page. It shows how to train on our custom data. But I am not sure how to predict. If there are two new sentences such as 1) this is the third ...
user2543622's user avatar
  • 6,228
4 votes
1 answer
1k views

Use Spacy to find most similar sentences in doc

I'm looking for a solution to use something like most_similar() from Gensim but using Spacy. I want to find the most similar sentence in a list of sentences using NLP. I tried to use similarity() ...
Heraknos's user avatar
  • 373
4 votes
0 answers
326 views

Siamese BiLSTM neural network with Manhattan distance give very different similarity score each time for the same test data

I'm applying Siamese Bidirectional LSTM (BiLSTM) using character-level sequences and embeddings for long texts. The embeddings model is Word2vec, the sequence length is None to handle variable ...
MManahi's user avatar
  • 41
3 votes
4 answers
3k views

How to save a SetFit trainer locally after training

I am working on an HPC with no internet access on worker nodes and the only option to save a SetFit trainer after training, is to push it to HuggingFace hub. How do I go about saving it locally to ...
Tanish Bafna's user avatar
3 votes
3 answers
1k views

String comparison with BERT seems to ignore "not" in sentence

I implemented a string comparison method using SentenceTransformers and BERT like following from sentence_transformers import SentenceTransformer from sklearn.metrics.pairwise import cosine_similarity ...
Tiago Bachiega de Almeida's user avatar
3 votes
1 answer
7k views

fasttext pre trained sentences similarity

I want to use fasttext pre-trained models to compute similarity a sentence between a set of sentences. can anyone help me? what is the best approach? I computed the similarity between sentences by ...
mili lali's user avatar
3 votes
1 answer
2k views

Does Euclidean Distance measure the semantic similarity?

I want to measure the similarity between sentences. Can I use sklearn and Euclidean Distance to measure the semantic similarity between sentences. I read about Cosine similarity also. Can someone ...
jenyK's user avatar
  • 71
3 votes
3 answers
5k views

Calculating words similarity score in python

I'm trying to calculate books similarity by comparing the topics lists. Need to get similarity score from the 2 lists between 0-1. Example: book1_topics = ["god", "bible", "...
Sapir's user avatar
  • 31
3 votes
2 answers
5k views

Bert fine-tuned for semantic similarity

I would like to apply fine-tuning Bert to calculate semantic similarity between sentences. I search a lot websites, but I almost not found downstream about this. I just found STS benchmark. I wonder ...
Chad's user avatar
  • 41
3 votes
1 answer
2k views

Gensim Doc2Vec most_similar() method not working as expected

I am struggling with Doc2Vec and I cannot see what I am doing wrong. I have a text file with sentences. I want to know, for a given sentence, what is the closest sentence we can find in that file. ...
Yann Droy's user avatar
  • 177
3 votes
1 answer
1k views

How to download and use the universal sentence encoder instead of loading it from url

I am using the universal sentence encoder to find sentence similarity. below is the code that i use to load the model import tensorflow_hub as hub model = hub.load("https://tfhub.dev/google/...
Jithin P James's user avatar
3 votes
1 answer
2k views

Using the a Universal Sentence Encoder Embedding Layer in Keras

I am trying to load USE as an embedding layer in my model using Keras. I used two approaches. the first one is adapted from the code here as follows: import tensorflow as tf tf.config....
Omnia's user avatar
  • 857
3 votes
2 answers
2k views

How can I use NLP to group multiple senteces by semantic similarity

I'm trying to increase the efficiency of a non-conformity management program. Basically, I have a database containing about a few hundred rows, each row describes a non-conformity using a text field. ...
Michael Longo's user avatar
3 votes
2 answers
2k views

converting a sentence to an embedding representation

If I have a sentence, ex: “get out of here” And I want to use word2vec Embed. to represent it .. I found three different ways to do that: 1- for each word, we compute the AVG of its embedding vector, ...
Minions's user avatar
  • 5,318
3 votes
1 answer
164 views

Batched BM25 search in PySpark

I have a large dataset of documents (average length of 35 words). I want to find the top k nearest neighbors of all these documents by using BM25. Every document needs to be compared with every other ...
theodre7's user avatar
  • 108
3 votes
1 answer
1k views

how to use sentence bert with transformers and torch

I would like to use sentence_transformers But due to policy restrictions I cannot install the package sentence-transformers I have transformers and torch package though. I went to this page and tried ...
user2543622's user avatar
  • 6,228
3 votes
0 answers
2k views

Text similarity as probability (between 0 and 1)

I have been trying to compute text similarity such that it'd be between 0 and 1, seen as a probability. The two text are encoded in two vectors, that are a bunch of numbers between [-1, 1]. So as two ...
inverted_index's user avatar
3 votes
1 answer
1k views

How to perform efficient queries with Gensim doc2vec?

I’m working on a sentence similarity algorithm with the following use case: given a new sentence, I want to retrieve its n most similar sentences from a given set. I am using Gensim v.3.7.1, and I ...
María Benavente's user avatar
3 votes
2 answers
2k views

Finding most similar sentence match

I have a large dataset containing a mix of words and short phrases, such as: dataset = [ "car", "red-car", "lorry", "broken lorry", "truck owner", "train", ... ] I am ...
user avatar
3 votes
0 answers
1k views

spark similarities between text sentences

I'm trying to find similarity between text messages (about 1 million text message), in my implementation each line represents an entry. In order to calculate similarity between those texts we adopt ...
jamil's user avatar
  • 51
3 votes
2 answers
628 views

Extrapolate Sentence Similarity Given Word Similarities

Assuming that I have a word similarity score for each pair of words in two sentences, what is a decent approach to determining the overall sentence similarity from those scores? The word scores are ...
Scott Klarenbach's user avatar
2 votes
2 answers
2k views

Efficient way for Computing the Similarity of Multiple Documents using Spacy

I have around 10k docs (mostly 1-2 sentences) and want for each of these docs find the ten most simliar docs of a collection of 60k docs. Therefore, I want to use the spacy library. Due to the large ...
LaLeLo's user avatar
  • 137
2 votes
2 answers
1k views

How to access document details from Doc2Vec similarity scores in gensim model?

I have been given a doc2vec model using gensim which was trained on 20 Million documents. The 20 Million documents it was trained are also given to me but I have no idea how or which order the ...
User54211's user avatar
  • 121
2 votes
1 answer
307 views

Is this already a string similarity algorithm?

I'm unfamiliar with string similarity algorithms except for Levenshtein Distance because that's what I'm using and it has turned out to be less than ideal. So I've kind of got an idea of a recursive ...
MetaStack's user avatar
  • 3,434
2 votes
1 answer
69 views

What robust algorithm implementation can I use to perform phrase similarity with two inputs?

This is the problem: I have two columns in my matadata database "field name" and "field description" I need to check if the "field description" is actually a description ...
emichester's user avatar
2 votes
1 answer
2k views

Is it possible to retrain Google's Universal Sentence Encoder such that it takes keywords into account when encoding sentences?

I am a bit confused on what it means to set trainable = True when loading the Universal Sentence Encoder 3. I have a small corpus (3000 different sentences), given a sentence I want to find the 10 ...
kspr's user avatar
  • 1,020
2 votes
1 answer
741 views

How to combine vectors generated by PV-DM and PV-DBOW methods of doc2vec?

I have around 20k documents with 60 - 150 words. Out of these 20K documents, there are 400 documents for which the similar document are known. These 400 documents serve as my test data. I am trying ...
Vikrant's user avatar
  • 139
2 votes
4 answers
1k views

How to find similar text in a large string?

I have a large string str and a needle ndl. Now, I need to find similar text of ndl from the string str. For example, SOURCE: "This is a demo text and I love you about this". NEEDLE: "I you ...
user373100's user avatar
2 votes
1 answer
1k views

How to use my own sentence embeddings in Keras?

I am new to Keras and I created my own tf_idf sentence embeddings with shape (no_sentences, embedding_dim). I am trying to add this matrix as input to an LSTM layer. My network looks something like ...
andra's user avatar
  • 23
2 votes
1 answer
967 views

How can I add new words in wordnet dictionary?

I am trying to match two sentences and find similarities. Seems like some of the word (Noun) from my sentence are not present in wordnet dictionary. How can I add them in wordnet?
Binoy Gupta's user avatar
2 votes
1 answer
3k views

Keras throws `'Tensor' object has no attribute '_keras_shape'` when splitting a layer output

I have sentence embedding output X of a sentence pair of dimension 2*1*300. I want to split this output into two vectors of shape 1*300 to calculate its absolute difference and product. x = ...
Aarthi's user avatar
  • 23
2 votes
1 answer
604 views

elasticsearch ngram and postgresql trigram search results are not match

I've crereated an index on elasticsearch same as bellow: "settings" : { "number_of_shards": 1, "number_of_replicas": 0, "analysis": { "filter": { "...
Ahmet Erkan ÇELİK's user avatar
2 votes
1 answer
88 views

Combine XML files based on entry similarity

I need to combine differently stuctured XML files using PHP. What I am doing is; Read first XML file using simplexml_load_file() Reformat the elements using a new structure using SimpleXMLElement() ...
Turab's user avatar
  • 182
2 votes
1 answer
109 views

String Similarity for all possible combination in Optimised fashion

I am facing a problem while finding string similarity. Scenario: The string which consisits of following fields first_name, middle_name and last_name What I have do is to find string similarity ...
Akhilesh mahajan's user avatar
2 votes
1 answer
720 views

How to map word level timestamps to text of a given transcript?

I am currently developing a tool to visualize song lyrics. The tool computes the similarity in the phonetics of syllables and assigns a rhyme group to each syllable. Syllables belonging to the same ...
paulpelikan's user avatar
2 votes
0 answers
751 views

Transform TF universal-sentence-encoder to torch

Is there a way I can convert and use Google's universal-sentence-encoder (available through TF hub) in pytorch?
Maiia Bocharova's user avatar
2 votes
2 answers
422 views

semantic similarity for mix of languages

I have a database of several thousands of utterances. Each record (utterance) is a text representing a problem description, which a user has submitted to a service desk. Sometimes also the service ...
Data Man's user avatar
2 votes
1 answer
1k views

How to extract sentences which has similar meaning/intent compared against a example list of sentences

I have chat interaction [Utterances] between Customer and Advisor and would want to know if the advisor interactions contains particular sentences or similar sentences in the below list: Example ...
baskarmac's user avatar
2 votes
1 answer
710 views

Cosine similarity is slow

I have a set of sentences, which is encoded using sentence encoder into vectors and I want to find out the most similar sentence to an incoming query. The search function looks as following: def ...
Jamik's user avatar
  • 75
1 vote
3 answers
6k views

Doc2Vec find the similar sentence

I am trying find similar sentence using doc2vec. What I am not able to find is actual sentence that is matching from the trained sentences. Below is the code from this article: from gensim.models....
Lolly's user avatar
  • 35.3k
1 vote
2 answers
2k views

How to speed up computing sentence similarity using spacy in Python?

I have the following code which takes in 2 sentences and return the similarity: nlp = spacy.load("en_core_web_md/en_core_web_md-3.2.0") def get_categories_nlp_sim(cat_1, cat_2): if (...
Tom's user avatar
  • 275

1
2 3 4 5