CloudxLab Blog | Learn AI, Machine Learning, Deep Learning, Devops & Big Data

Quality of Embeddings & Triplet Loss

Directed by: Sandeep Giri

OVERVIEW:

In Natural Language Processing (NLP), embeddings transform human language into numerical vectors. These are usually arrays of multiple dimensions & have schematic meaning based on their previous training text corpus The quality of these embeddings directly affects the performance of search engines, recommendation systems, chatbots, and more.

But here’s the problem:

Not all embeddings are created equal.

So how do we measure their quality?

To Identify the quality of embeddings i conducted one experiment:

I took 3 leading (Free) Text → Embedding pretrained models which worked differently & provided a set of triplets and found the triplets loss to compare the contextual importance of each one.

 1) Sentence-BERT (SBERT)

Transformer-based Captures deep sentence-level semantics:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(‘all-MiniLM-L6-v2’)

2) Universal Sentence Encoder (USE)

TensorFlow Encoder Good general-purpose semantic encoding:

import tensorflow_hub as hub

model2 = hub.load(“https://tfhub.dev/google/universal-sentence-encoder/4”)

embeddings = model2([“Cardiac arrest”])

3) FastText (by Facebook AI)

Word-based Lightweight, fast, but lacks context:

import fasttext.util

fasttext.util.download_model(‘en’, if_exists=’ignore’)

ft3 = fasttext.load_model(‘cc.en.300.bin’)

vec = ft3.get_word_vector(“Cardiac arrest”)

when i compared the sizes of output produced by them are different for each one

(384,), (1, 512), (300,)

GOALS

To compare them using a triplet-based evaluation approach using triplet loss.
Identify the Understanding of these around medical terminologies

CONCEPTS

What is Triplet Loss?

Triplet loss works with a 3-part input:

Anchor: The base sentence or phrase

Positive: A semantically similar phrase

Negative: A semantically absurd or unrelated phrase

Anchor	Positive	Negative
tuberculosis	Lung infection	test tube accident
cardiac arrest	heart attack	cardi b arrest
asthma	respiratory condition	Spiritual awakening

Samples From my test dataset

The goal is to push the anchor close to the positive and far from the negative in embedding space.

TripletLoss = max (d (a , p) − d (a , n) + margin , 0 )

a = anchor vector

p = positive vector (should be close to

anchor)

n = negative vector (should be far from

anchor)

d(x,y) = cosine distance

margin = a buffer that forces the negative to be not just farther, but significantly farther

What is Cosine Similarity?

Cosine similarity is a measure of how similar two vectors are — based on the angle between them rather than their magnitude. In the context of NLP, vectors represent words or sentences as embeddings.

Cosine Similarity (A.B) = A.B / ||A||.||B||

CosineDistance(A,B) = 1 − CosineSimilarity(A,B)

What is Margin?

The margin is a safety cushion.

If margin = 0.2, then even if the negative is slightly farther than the positive, the model still gets a penalty unless it’s at least 0.2 farther.

Testing The accuracy

TEST-SET(click on test set see set)

We ran each model over a set of ~50 curated triplets

Calculated:

Anchor–Positive distance (AP)

Anchor–Negative distance (AN)

Triplet loss

Visualized both individual performance per triplet and overall averages

(“asthma”, “respiratory condition”, “spiritual awakening”),

(“pneumonia”, “lung infection”, “foggy window”),

# General & Internal Medicine

(“diabetes”, “high blood sugar”, “candy addiction”),

(“arthritis”, “joint inflammation”, “rusty hinge”),

50+ such examples

Results:

less the cosine distance more accurate the understanding of the context AP need be less as we can see of SBERT & USE is less where FASTTEXT is keep messing up

Using PCA, we visualized where each model placed the anchor, positive, and negative in space.
Insert 2D scatter plot
You can actually see the anchor and positive clustering together especially in SBERT’s case while the negative floats far away. In below image

for better understanding you can see the Euclidean distance where anchor & positive are relatively closer than negative (SBERT)

The USE is also able to classify the hard medical terminology quite precisely like diabetes= high blood sugar not a candy addiction

On the other hand the Fasttext basically classified it at same place like cardiac arrest = heart attack = cardi b arrest which is wrong

Triplet Example wise Score :

the SBERT ,USE performed efficiently if we not consider 2-3 examples where fasttext not able to understand any of context (just understood characters hierarchy & similarity)

The SBERT & USE performed good as we can see using few interpretations and loss tracking of triplets

CONCLUSION:

What We Learned

SBERT is highly reliable for understanding sentence-level meaning

USE performs reasonably well and is easy to use with TensorFlow

FastText, while fast, struggles with context and full sentences

Visual Results

Triplet Loss (Lower = Better)

SBERT : 0.0381

USE : 0.0320

FastText : 0.2175

If you’re building Search engines , Recommendation systems , Chatbots …or anything involving meaning, good embeddings are key. Triplet loss is a simple yet powerful way to test how smart your model really is. I insist all of the preprocessing stages of making NLP models or context based systems the Triplet Loss needs to be used to select optimal pretrained or trained model.

The source code if want to conduct the experiments. Good Luck..!

Discover Machine Learning Made Simple with “Ancient Science of Prediction”

Have you ever wondered how we predict things—like how much your grocery bill will be or how much website traffic to expect at a certain time? Prediction isn’t just a modern trick; it’s an ancient skill we’ve relied on for survival for centuries. And now, there’s a YouTube playlist that makes this fascinating science accessible to everyone: Ancient Science of Prediction.

This machine learning series is designed for students, non-tech learners, and total beginners, breaking down complex ML concepts in a clear, approachable style. Whether you’re excited to explore new ideas or just starting out, this series will guide you through the foundations of prediction and machine learning in an engaging way!

AI in Creative Fields: The Next Frontier for Art, Music, and Writing

Artificial Intelligence (AI) has revolutionized various industries, and the creative arts are no exception. From generating art pieces to composing music and crafting compelling narratives, AI is increasingly becoming a collaborator in creative processes. This blog explores how AI reshapes art, music, and writing, the tools driving these changes, and the implications for creators and consumers.

Overview of AI in Art Creation

AI systems generate visual art using deep learning models trained on large datasets of images. These systems learn patterns, styles, and textures from the training data and then use this knowledge to produce new, unique works of art.

Key Technologies in AI Art Generation

Here are the main technologies and methods behind art generation, with their technical explanations:

1. Generative Adversarial Networks (GANs):

GANs are one of the most popular AI models used in art generation. They consist of two neural networks:

- Generator: Creates new images.
- Discriminator: Evaluates whether an image is real (from training data) or fake (from the generator).

The Art and Science of Protein : Morphing Protein Assembly by Design

Introduction

Life’s fundamental processes rely on the ability of proteins to self-assemble into Complex structures, forming molecular machines that drive everything from photosynthesis to muscle contraction. Inspired by nature’s sophisticated protein assemblies, scientists have spent decades designing artificial protein structures with novel functions. The rational design of protein self assembly is an interdisciplinary effort, merging principles from biophysics, supramolecular chemistry, materials science, and computational modeling. This blog explores how researchers are mastering the complexities of protein self-assembly to create innovative materials and functional architectures.

Neutralizing Deadly Snake Toxins: A Scientific Approach to Saving Lives

Snakebites pose a major public health challenge, especially in tropical and subtropical regions. Every year, millions suffer from venomous snakebites, leading to over 100,000 deaths and countless cases of amputations or permanent disabilities. Snake venom contains potent toxins that can cause paralysis, tissue destruction, and internal bleeding, making rapid and effective treatment essential.

Fortunately, recent advancements in science and medicine are paving the way for more effective treatments. In this blog, we’ll explore how snake venom affects the body, current treatment methods, and groundbreaking innovations that are set to revolutionize antivenom therapy

“Stable Diffusion Explained: Modern Text-to-Image Technology”

Introduction

What if you could write, ‘A cozy cabin in the woods, surrounded by snow, under a beautiful aurora,’

Or ,”A man reading a blog online from CloudxLab Website.”

Or ,”An ancient castle on a cliff, with waves crashing below and the moon glowing overhead “

and within seconds, seeing a perfect image of it come to life. That’s the magic of Stable Diffusion – a groundbreaking technology reshaping creativity as we know it .

Using AI to Detect Cancer at an Early Stage: Transforming Diagnosis and Treatment

Cancer is one of the leading causes of death worldwide, with millions of new cases diagnosed every year. The key to improving survival rates is early detection, as cancers caught in their initial stages are significantly more treatable. Traditional diagnostic methods, such as biopsies, CT scans, MRIs, and mammograms, have limitations in accuracy, speed, and accessibility.

This is where Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) are making a creative impact. AI-driven cancer detection systems are improving accuracy, reducing diagnostic time, and making cancer screening more accessible to populations worldwide. This blog explores how AI is transforming early cancer detection, its history, current advancements, and future potential.

A Brief History of Cancer Detection

Before modern medical imaging, cancer detection relied heavily on physical symptoms and biopsy procedures. By the late 19^th and early 20^th centuries, X-rays and microscopy became essential tools for identifying abnormal growths. However, misdiagnosis rates were high due to human limitations in analyzing medical images.

The Role of AI in Healthcare: A Deep Dive into Its Trans-formative Impact

Artificial Intelligence (AI) is revolutionizing healthcare by enhancing diagnostic accuracy, enabling personalized treatment strategies, and optimizing operational efficiency. This in-depth case study examines the practical application of AI in a healthcare environment, highlighting its effects, challenges, and future possibilities.

Data to Diagnosis: How AI is Transforming HealthCare

Healthcare is a dynamic and evolving field that presents numerous challenges for everyone involved. However, the debut of artificial intelligence (AI) is creating new opportunities to enhance care delivery and improve patient outcomes. With the rapid advancements in AI technology, its integration into clinical practice holds the promise of transforming healthcare in unmatched ways

AI is poised to revolutionize healthcare, with the global AI healthcare market expected to reach $102.7 billion by 2028. This transformation will bring about annual cost savings of $150 billion by optimizing operations, minimizing diagnostic errors, and enhancing treatment efficiency. AI-powered tools have the potential to improve disease diagnosis, boosting accuracy by 20-30%, while dramatically speeding up drug discovery processes, cutting development times by 50%. Wearables and remote monitoring systems could help reduce hospital readmissions by 38%, and precision medicine is forecasted to lower treatment costs by 20%. Furthermore, AI-driven robots are anticipated to perform 30% of surgeries by 2030, leading to better patient outcomes. In underserved areas, mobile AI solutions could dramatically increase healthcare access for billions, bridging gaps and improving healthcare delivery on a global scale.

Revolutionizing Mental Health Care with AI and AI-Powered Chatbots

Mental health care is an essential component of overall well-being, yet it remains one of the most underserved areas of medicine. The stigma surrounding mental health issues, coupled with limited access to qualified professionals, has created barriers to effective care for millions worldwide. AI-powered chatbots are emerging as a promising solution to bridge these gaps, providing accessible, scalable, and cost-effective mental health support. This blog explores how these innovative tools revolutionize mental health care, their challenges, and their potential future impact.

History of AI in Mental Health Care

The integration of artificial intelligence into mental health care has a rich and evolving history. The journey began in the mid-20th century with the development of early AI programs designed to simulate human conversation. One of the earliest examples was ELIZA, created in the 1960s by computer scientist Joseph Weizenbaum. ELIZA was a rudimentary chatbot that used pattern matching and substitution methodology to simulate a psychotherapist’s responses. While basic by today’s standards, ELIZA demonstrated the potential of conversational AI in providing mental health support.