{"id":4286,"date":"2024-05-07T10:19:37","date_gmt":"2024-05-07T10:19:37","guid":{"rendered":"https:\/\/cloudxlab.com\/blog\/?p=4286"},"modified":"2024-05-09T07:02:39","modified_gmt":"2024-05-09T07:02:39","slug":"understanding-embeddings-and-matrices-with-the-help-of-sentiment-analysis-and-llms-hands-on","status":"publish","type":"post","link":"https:\/\/cloudxlab.com\/blog\/understanding-embeddings-and-matrices-with-the-help-of-sentiment-analysis-and-llms-hands-on\/","title":{"rendered":"Understanding Embeddings and Matrices with the help of Sentiment Analysis and LLMs (Hands-On)"},"content":{"rendered":"\n<p>Imagine you&#8217;re browsing online and companies keep prompting you to rate and review your experiences. Have you ever wondered how these companies manage to process and make sense of the deluge of feedback they receive? Don&#8217;t worry! They don&#8217;t do it manually. This is where sentiment analysis steps in\u2014a technology that analyzes text to understand the emotions and opinions expressed within.<\/p>\n\n\n\n<p>Companies like <strong>Amazon<\/strong>, <strong>Airbnb<\/strong>, and others harness sentiment analysis to extract valuable insights. For example, Amazon refines product recommendations based on customer sentiments, while Airbnb analyzes reviews to enhance accommodations and experiences for future guests. <strong>Sentiment analysis silently powers these platforms<\/strong>, empowering businesses to better understand and cater to their customers&#8217; needs.<\/p>\n\n\n\n<p>Traditionally, companies like Amazon had to train complex models specifically for sentiment analysis. These models required significant time and resources to build and fine-tune. However, the game changed with Large Language Models like <strong>OpenAI&#8217;s ChatGPT, Google&#8217;s Gemini, Meta&#8217;s Llama<\/strong>, etc. which have revolutionized the landscape of natural language processing.<\/p>\n\n\n\n<p>Now, with Large Language Models (LLMs), sentiment analysis becomes remarkably easier. LLMs are exceptionally skilled at understanding the sentiment of text because they have been trained on vast amounts of language data, enabling them to understand the subtleties of human expression.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img width=\"1792\" height=\"1024\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/img-9HiLwYctaZhJCf1JjleJlHTl.png\" alt=\"\" class=\"wp-image-4336\"\/><figcaption>Generated from Dall E 3<\/figcaption><\/figure>\n\n\n\n<!--more-->\n\n\n\n<p>In this blog, we will embark on a journey where we will understand these concepts step by step. We&#8217;ll start from the very basics, assuming no prior mathematical expertise. By the end, you&#8217;ll not only grasp the mechanics of word embeddings and sentiment analysis but also understand the underlying math\u2014yes, even without needing to reach for your algebra book.<\/p>\n\n\n\n<p>Imagine this: we&#8217;ll explore how words can be visualized and transformed into points in space, where similar words cluster together, and we&#8217;ll unveil the secret sauce of how to measure the &#8220;distance&#8221; and &#8220;similarity&#8221; between these word-points. We&#8217;ll even delve into the Pythagorean theorem\u2014yes, the same one you might recall from school\u2014and see how it helps us calculate these distances in multiple dimensions. We&#8217;ll also perform hands-on sentiment analysis using LLMs.<\/p>\n\n\n\n<p>So buckle up! By the end of this journey, you&#8217;ll not only appreciate the beauty of embeddings and matrices but also gain a newfound confidence in handling the magic that powers modern language understanding. Let&#8217;s dive right in with our first stop: &#8220;What is a vector?&#8221;<\/p>\n\n\n\n<p>If you prefer video tutorials, we&#8217;ve got you covered too! Check out our accompanying video tutorial to dive deeper into the world of sentiment analysis and how it works with vectors and matrices.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube\"><div class=\"wp-block-embed__wrapper\">\n<div style=\"max-width: 1778px;\"><div style=\"left: 0; width: 100%; height: 0; position: relative; padding-bottom: 56.25%;\"><iframe title=\"Free Masterclass: Mastering Sentiment Analysis with Large Language Models (LLMs)\" src=\"\/\/if-cdn.com\/vDx7Hg1?maxheight=1000\" style=\"top: 0; left: 0; width: 100%; height: 100%; position: absolute; border: 0;\" allowfullscreen scrolling=\"no\" allow=\"encrypted-media *;\"><\/iframe><\/div><\/div><script type=\"text\/javascript\">window.addEventListener(\"message\",function(e){\n                window.parent.postMessage(e.data,\"*\");\n            },false);<\/script>\n<\/div><\/figure>\n\n\n\n<p>Let&#8217;s get started!<\/p>\n\n\n\n<h2>What is a Vector?<\/h2>\n\n\n\n<p>A vector is an essential concept in mathematics and physics. It&#8217;s like an arrow with a specific length (magnitude) and direction. Imagine a straight arrow in space. That arrow is a vector.<\/p>\n\n\n\n<h5>Components of a Vector:<\/h5>\n\n\n\n<ul><li><strong>Magnitude:<\/strong> This is just a fancy word for the vector&#8217;s length or size. Imagine measuring the length of the arrow from its tail to its tip.<\/li><li><strong>Direction:<\/strong> Think of this as the way the arrow points. Does it go up, down, left, right, or at some angle?<\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img width=\"449\" height=\"221\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/image.png\" alt=\"image.png\" class=\"wp-image-4294\"\/><\/figure>\n\n\n\n<p>In mathematical terms, we represent vectors using coordinates. For example:<\/p>\n\n\n\n<ul><li>In 2D, a vector can be written as <strong>v = (x, y)<\/strong>, where <strong>x<\/strong> and <strong>y<\/strong> represent the horizontal (x-axis) and vertical (y-axis) components, respectively.<\/li><li>In 3D, vectors are represented as <strong>v = (x, y, z)<\/strong>, incorporating a third dimension.<\/li><li>And, in an N-dimensional space, a vector can be expressed as <strong>v = (x\u2081, x\u2082, x\u2083, &#8230;, x\u2099)<\/strong>, extending into multiple dimensions.<\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img width=\"800\" height=\"629\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/image-1.png\" alt=\"image.png\" class=\"wp-image-4295\"\/><\/figure>\n\n\n\n<h5>Vector Notation:<\/h5>\n\n\n\n<p>In most standard vector representations, the tail of the vector is considered as the origin (0, 0). This means that the coordinates of the vector&#8217;s endpoint are relative to the origin.<\/p>\n\n\n\n<p>The coordinate (2, 3) in coordinate system is represented as:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img width=\"392\" height=\"278\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/image-2.png\" alt=\"image.png\" class=\"wp-image-4296\"\/><\/figure>\n\n\n\n<p>The above point (2, 3) in vector space will be represented by movement from origin. It means you move 2 units to the right (along the x-axis) and 3 units up (along the y-axis) from the origin (0, 0).<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img width=\"379\" height=\"387\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/image-3.png\" alt=\"image.png\" class=\"wp-image-4297\"\/><\/figure>\n\n\n\n<p>In the realm of technology and data science, vectors serve as the foundation for a wide range of applications, from computer graphics to machine learning.<\/p>\n\n\n\n<p>For instance, in image processing, each pixel in a digital image can be represented as a vector of color values, allowing for complex operations like image manipulation and feature extraction.<\/p>\n\n\n\n<p>Also, if we consider the world of natural language processing, text data, such as sentences or documents, can be represented as vectors using sophisticated techniques like <strong>word embeddings<\/strong>. These embeddings encode semantic meaning and context into numerical vectors, enabling computers to process and understand human language more effectively. Let&#8217;s understand them in more detail.<\/p>\n\n\n\n<h2>Word Embeddings<\/h2>\n\n\n\n<p>Imagine you have a big book filled with lots of words. Each word has its own unique meaning, and when we use these words in sentences, they help us communicate ideas and thoughts. Now, word embeddings are like a super-smart way of representing these words in a special numerical way that a computer can understand.<\/p>\n\n\n\n<h5><strong>What are Word Embeddings?<\/strong><\/h5>\n\n\n\n<p>At its core, a word embedding is a numerical representation of a word in a multi-dimensional space. Each word is mapped to a vector of real numbers, where the values in this vector encode semantic and syntactic information about the word. This transformation allows us to analyze and manipulate words using mathematical operations, opening up a wealth of possibilities for NLP tasks.<\/p>\n\n\n\n<p>So, just as we humans know the context of the word &#8220;king&#8221; \u2014 where it is used and what it means \u2014 by representing words as word embeddings, machines also gain an understanding of the context of words. Therefore, we can say that machines can understand language with the help of word embeddings.<\/p>\n\n\n\n<p>The word &#8220;king&#8221; can be represented as a vector of numbers, typically with a length of, for example, 150 dimensions. This vector might look something like this:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">king=[0.1,0.05,0.2,\u2026,150&nbsp;numbers]<\/pre>\n\n\n\n<p>Each number in this vector carries specific information about the word &#8220;king&#8221; based on its usage and context in a large corpus of text.<\/p>\n\n\n\n<p>The length of the vector (e.g., 150 dimensions) determines the level of detail and complexity in representing the word. More dimensions can potentially capture richer semantic information but require more computational resources.<\/p>\n\n\n\n<h5>How Do Word Embeddings Work?<\/h5>\n\n\n\n<p>To grasp the concept better, let&#8217;s visualize words as points in a high-dimensional space. <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img width=\"897\" height=\"342\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/image-4.png\" alt=\"image.png\" class=\"wp-image-4309\"\/><\/figure>\n\n\n\n<p>In the above image, we can see that the words that share similar contexts or meanings in language are positioned closer together in this space. For instance, words like &#8220;king&#8221; and &#8220;queen&#8221; would be located near each other because they often appear in similar contexts (royalty, monarchy, etc.).  <\/p>\n\n\n\n<p>For &#8220;walk&#8221; and &#8220;walking&#8221;, these words would be positioned closely together in the embedding space because they share a related meaning and often appear in similar contexts, representing the action of moving on foot. Similarly, &#8220;Canada&#8221; and &#8220;Ottawa&#8221; would be situated near each other in the embedding space due to their semantic relationship, reflecting that Ottawa is the capital city of Canada.<\/p>\n\n\n\n<p>But this is not it. The magic lies in the mathematical relationships between these word vectors. By performing vector operations, such as addition or subtraction, we can uncover intriguing linguistic insights. <\/p>\n\n\n\n<p>Imagine we have vectors representing words like &#8220;king,&#8221; &#8220;man,&#8221; &#8220;woman,&#8221; and &#8220;queen.&#8221; When we subtract the vector for &#8220;man&#8221; from &#8220;king&#8221; and then add the vector for &#8220;woman,&#8221; the resulting vector is very close to the vector for &#8220;queen.&#8221;<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">king - man + woman = queen<\/pre>\n\n\n\n<p>This ability to perform calculations with words helps computers grasp subtle meanings and relationships, making language processing more powerful and easy to understand.<\/p>\n\n\n\n<h2>Visualizing Word Embeddings<\/h2>\n\n\n\n<p>Let&#8217;s dive into the exciting world of word embeddings by visualizing them. <\/p>\n\n\n\n<h5>Setting up environment<\/h5>\n\n\n\n<p>We will be using OpenAIEmbeddings for our purpose. To use it, you will need an OpenAI API key.<\/p>\n\n\n\n<p>Follow the below steps to generate OpenAI API Key:<\/p>\n\n\n\n<ol><li>First, create an&nbsp;<a target=\"_blank\" href=\"https:\/\/platform.openai.com\/signup\" rel=\"noreferrer noopener\">OpenAI account<\/a>&nbsp;or&nbsp;<a target=\"_blank\" href=\"https:\/\/platform.openai.com\/login\" rel=\"noreferrer noopener\">sign in<\/a>.<\/li><li>Next, navigate to the&nbsp;<a target=\"_blank\" href=\"https:\/\/platform.openai.com\/account\/api-keys\" rel=\"noreferrer noopener\">API key page<\/a>&nbsp;and &#8220;Create new secret key&#8221;, optionally naming the key. Make sure to save this somewhere safe and do not share it with anyone.<\/li><\/ol>\n\n\n\n<p>Once you have your key, we are ready to go.<\/p>\n\n\n\n<div class=\"wp-block-columns\">\n<div class=\"wp-block-column\" style=\"flex-basis:100%\">\n<p>You&#8217;ll need to install some Python packages to get started with the project. If you want to bypass the technical hurdles of setting up complex environments with libraries and frameworks, you can use our cloud lab which equips you with the prepared environment for building Generative AI and LLM apps, where you do not need to waste any time in configurations, but can directly start learning. You also get access to hands-on projects such as \u201c<strong><a href=\"https:\/\/cloudxlab.com\/assessment\/playlist-intro\/3101\/project-building-a-rag-chatbot-from-your-website-d\">Building a RAG Chatbot from Your Website Data using OpenAI and Langchain<\/a><\/strong>\u201c and others to learn Generative AI in a hands-on way. Check out <a href=\"https:\/\/cloudxlab.com\/blog\/building-generative-ai-and-llms-with-cloudxlab\/\">Building Generative AI and LLMs with CloudxLab<\/a> for further details.<\/p>\n\n\n\n<h5>Step 1: Setting up OpenAI API Key<\/h5>\n\n\n\n<p>To begin, you&#8217;ll need to set up your OpenAI API key as an environment variable in your Python script:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">import os\nos.environ[\"OPENAI_API_KEY\"] = \"YOUR_API_KEY\"<\/pre>\n\n\n\n<p>Replace <code>\"YOUR_API_KEY\"<\/code> with your actual OpenAI API key.<\/p>\n<\/div>\n<\/div>\n\n\n\n<h5>Step 2: Retrieving Embeddings<\/h5>\n\n\n\n<p>Let&#8217;s write a function that takes a word or sentence as an input and returns its embeddings.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">from openai import OpenAI\n \nclient = OpenAI()\n \ndef get_openai_embedding(text, model=\"text-embedding-ada-002\"):\n    text = text.replace(\"\\n\", \" \")\n    return client.embeddings.create(input = [text], model=model).data[0].embedding<\/pre>\n\n\n\n<p>We are using &#8220;<strong>text-embedding-ada-002<\/strong>&#8221; model for Embeddings. You can check out more embedding models from OpenAI at <a href=\"https:\/\/platform.openai.com\/docs\/guides\/embeddings\/embedding-models\">OpenAI Embeddings models<\/a>.<\/p>\n\n\n\n<h5>Step 3: Understanding Embeddings<\/h5>\n\n\n\n<p>Let&#8217;s see how word embeddings look like by retrieving the embedding for the word &#8220;queen&#8221;:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">queen = get_openai_embedding(\"queen\")\nprint(\"Length of queen\", len(queen))\nprint(queen)<\/pre>\n\n\n\n<p>The output looks like:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img width=\"985\" height=\"195\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/Screenshot-2024-05-07-at-1.49.18-PM.png\" alt=\"\" class=\"wp-image-4316\"\/><\/figure>\n\n\n\n<p>The resulting <code>embedding<\/code> is a vector representation of the word &#8220;queen&#8221; in a high-dimensional space (1536 dimensions for <strong>text-embedding-ada-002<\/strong>).<\/p>\n\n\n\n<h5>Step 4: Writing function to visualize words<\/h5>\n\n\n\n<p>The retrieved embeddings are high-dimensional (e.g., 1536 dimensions), making it challenging to directly visualize them. Hence, we use PCA (Principal Component Analysis) to reduce the dimensionality to 2D for visualization purposes.<\/p>\n\n\n\n<p>Let&#8217;s start with importing necessary libraries.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>import<\/strong> matplotlib.pyplot <strong>as<\/strong> plt\n<strong>from<\/strong> sklearn.decomposition <strong>import<\/strong> PCA <\/pre>\n\n\n\n<p>Now, define a function to visualize word embeddings in 2D space:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">def visualize_pca_2d(embeddings, words):\n    pca_2d = PCA(n_components=2)\n    embeddings_2d = pca_2d.fit_transform(embeddings)\n\n    # Create a 2D scatter plot\n    plt.figure(figsize=(10, 6))\n    plt.scatter(embeddings_2d[:, 0], embeddings_2d[:, 1], marker='o')\n    for i, word in enumerate(words):\n        plt.annotate(word, (embeddings_2d[i, 0], embeddings_2d[i, 1]))\n\n    plt.xlabel(\"Principal Component 1\")\n    plt.ylabel(\"Principal Component 2\")\n    plt.title(\"2D Visualization of Word Embeddings\")\n    plt.grid(True)\n    plt.show()<\/pre>\n\n\n\n<p>Now, let&#8217;s visualize embeddings for a list of words:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">words = ['cat', 'dog', 'bike', 'kitten', 'puppy', 'bicycle', 'aeroplane', 'helicopter', 'cow', 'wolf', 'lion', 'fighter jet']\nembeddings = []\nfor i in words:\n    embeddings.append(get_openai_embedding(i))\nvisualize_pca_2d(embeddings, words)<\/pre>\n\n\n\n<p>We get the plot as:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img width=\"642\" height=\"387\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/image-5.png\" alt=\"\" class=\"wp-image-4317\"\/><\/figure>\n\n\n\n<p>In the above plot, we observe meaningful clusters that reflect semantic relationships between words. Animal-related terms like &#8220;lion,&#8221; &#8220;cow,&#8221; &#8220;cat,&#8221; &#8220;dog,&#8221; and &#8220;wolf&#8221; form a distinct group due to their shared semantic context. Similarly, air transportation-related words such as &#8220;aeroplane,&#8221; &#8220;helicopter,&#8221; and &#8220;fighter jet&#8221; cluster together, indicating common associations related to transportation.  Additionally, pet names like &#8220;kitten&#8221; and &#8220;puppy&#8221; are closely grouped, highlighting their shared attributes as young animals. Interestingly, the transportation cluster appears closer to each other compared to the animal cluster, suggesting stronger semantic relationships within this category.<\/p>\n\n\n\n<p>Feel free to explore more word combinations and share your observations in the comments below!<\/p>\n\n\n\n<h2>Pythagorean Theorem<\/h2>\n\n\n\n<p>Remember our school friend from maths, Pythagoras? Well, he gifted us a marvelous discovery known as the Pythagorean Theorem. This fundamental theorem holds the key to understanding the relationships within right-angled triangles. Let&#8217;s delve into the magic of right triangles and how this theorem works its wonders!<\/p>\n\n\n\n<h3>Understanding the Pythagorean Theorem<\/h3>\n\n\n\n<p>In a right-angled triangle, the Pythagorean Theorem states that the square of the length of the hypotenuse (the side opposite the right angle) is equal to the sum of the squares of the lengths of the other two sides.<\/p>\n\n\n\n<p>Mathematically, for a right triangle with sides of lengths &#8216;a&#8217; and &#8216;b&#8217;, and a hypotenuse of length &#8216;c&#8217;, the Pythagorean Theorem can be expressed as:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong><span class=\"has-inline-color has-medium-gray-color\">c<sup>2<\/sup> = a<sup>2<\/sup> + b<sup>2<\/sup><\/span><\/strong><\/pre>\n\n\n\n<p>Let&#8217;s bring this theorem to life with a practical example and a visual diagram.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img width=\"499\" height=\"387\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/image-6.png\" alt=\"image.png\" class=\"wp-image-4321\"\/><\/figure>\n\n\n\n<p>In the above plot, we can see that we have three sides, with<strong> c = 5 <\/strong>units, <strong>a = 4 <\/strong>units and <strong>b = 3<\/strong> units.<\/p>\n\n\n\n<p>Now, let&#8217;s prove the Pythagorean Theorem using our example: <strong>c<sup>2<\/sup> = a<sup>2<\/sup> + b<sup>2<\/sup><\/strong><\/p>\n\n\n\n<div class=\"wp-block-columns\">\n<div class=\"wp-block-column\" style=\"flex-basis:100%\">\n<pre class=\"wp-block-verse\"><span class=\"has-inline-color has-medium-gray-color\">c<sup>2<\/sup> = a<sup>2<\/sup> + b<sup>2<\/sup>\n5<sup>2<\/sup> = 4<sup>2<\/sup> + 3<sup>2<\/sup>\n25 = 16 + 9\n25 = 25<\/span><\/pre>\n<\/div>\n<\/div>\n\n\n\n<p>Voila! The theorem holds true, showcasing the beautiful symmetry and relationship between the sides of a right triangle.<\/p>\n\n\n\n<h2>Extending the Pythagorean Theorem: Calculating Distance in 2D Space<\/h2>\n\n\n\n<p>Let&#8217;s dive in and follow along with the code in your notebook!<\/p>\n\n\n\n<h3 id=\"Step-1:-Two-Points---Define-the-Points\">Step 1: Two Points &#8211; Define the Points<\/h3>\n\n\n\n<p>Imagine two points in a 2-D space. These points can represent anything from physical locations to abstract quantities.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">point1 <strong>=<\/strong> (1, 2)\npoint2 <strong>=<\/strong> (4, 6)<\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img width=\"497\" height=\"387\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/image-7.png\" alt=\"image.png\" class=\"wp-image-4322\"\/><\/figure>\n\n\n\n<h3 id=\"Step-2:-Connecting-a-Line---Calculate-Differences-and-Hypotenuse\">Step 2: Connecting a Line &#8211; Calculate Differences and Hypotenuse<\/h3>\n\n\n\n<p>To measure the distance between these points, connect them with a straight line. This line represents the shortest path between the two points, equivalent to the hypotenuse of a right-angled triangle.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img width=\"497\" height=\"387\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/image-8.png\" alt=\"image.png\" class=\"wp-image-4323\"\/><\/figure>\n\n\n\n<h3>Step 3: Calculate the differences in coordinates<\/h3>\n\n\n\n<p>To measure the distance between two points in a 2-dimensional space, we start by calculating the differences in their x-coordinates (horizontal) and y-coordinates (vertical).<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">side1 <strong>=<\/strong> point2[0] <strong>-<\/strong> point1[0]\nside2 <strong>=<\/strong> point2[1] <strong>-<\/strong> point1[1]\nprint(f\"Length of side1 is {side1} units\")\nprint(f\"Length of side2 is {side2} units\")<\/pre>\n\n\n\n<p>We get the output as: <\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"has-inline-color has-medium-gray-color\">Length of side1 is 3 units\nLength of side2 is 4 units<\/span><\/pre>\n\n\n\n<figure class=\"wp-block-image size-large\"><img width=\"497\" height=\"387\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/image-9.png\" alt=\"image.png\" class=\"wp-image-4324\"\/><\/figure>\n\n\n\n<p>As observed in the plot above, it resembles a right-angled triangle, where the straight line connecting the two points acts as the hypotenuse, and the calculated differences (side1 and side2) represent the lengths of the other two sides. We&#8217;ve already computed the lengths of these sides.<\/p>\n\n\n\n<p>Now, let&#8217;s apply the Pythagorean Theorem to calculate the length of the hypotenuse, which will determine the distance between these two points.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">hypotenuse_square <strong>=<\/strong> side1<strong>**<\/strong>2 <strong>+<\/strong> side2<strong>**<\/strong>2\ndistance_2d <strong>=<\/strong> math<strong>.<\/strong>sqrt(hypotenuse_square)\nprint(f\"The length of the hypotenuse, i.e., distance between two points is {distance_2d} units\")<\/pre>\n\n\n\n<p>The output of the above code is:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"has-inline-color has-medium-gray-color\">The length of the hypotenuse, i.e., distance between two points is 5.0 units<\/span><\/pre>\n\n\n\n<p>So for two dimension, we can also represent the distance as:<\/p>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container\"><p>point1 = (1, 2)<br \/>\npoint2 = (4, 6)<br \/>\ndistance = hypotenuse<br \/>\ndistance = \\(\\sqrt{hypotenuse^2}\\)<br \/>\ndistance = \\(\\sqrt{side1^2 + side2^2}\\)<br \/>\ndistance = \\(\\sqrt{(point2[0] &#8211; point1[0])^2 + (point2[1] &#8211; point1[1])^2}\\)<\/p>\n<\/div><\/div>\n\n\n\n<pre class=\"wp-block-preformatted\">distance_2d <strong>=<\/strong> ((point2[0] <strong>-<\/strong> point1[0])<strong>**<\/strong>2 <strong>+<\/strong> (point2[1] <strong>-<\/strong>point1[1])<strong>**<\/strong>2)<strong>**<\/strong>0.5\nprint(\"Distance between 2D points:\", distance_2d)<\/pre>\n\n\n\n<p>The output for the above code is:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"has-inline-color has-medium-gray-color\">Distance between 2D points: 5.0 <\/span><\/pre>\n\n\n\n<h3>Step 4: Deriving Euclidean Distance<\/h3>\n\n\n\n<p>Now, let&#8217;s represent the coordinates in terms of x and y:-<\/p>\n\n\n<p>point1 = (1, 2)<br \/>\npoint2 = (4, 6)<br \/>\n\\({x_2}\\) = point2[0] = 4<br \/>\n\\({x_1}\\) = point1[0] = 1<br \/>\n\\({y_2}\\) = point2[1] = 6<br \/>\n\\({y_1}\\) = point1[1] = 2<br \/>\ndistance = \\(\\sqrt{(x_2 \u2013 x_1)^2 + (y_2 \u2013 y_1)^2}\\)<br \/>\ndistance = \\(\\sqrt{(4 \u2013 1)^2 + (6 \u2013 2)^2}\\)<br \/>\ndistance = \\(\\sqrt{9 + 16}\\)<br \/>\ndistance = \\(\\sqrt{25}\\)<br \/>\ndistance = 5<\/p>\n\n\n\n<p>This is called euclidean distance.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img width=\"521\" height=\"329\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/image-10.png\" alt=\"image.png\" class=\"wp-image-4328\"\/><\/figure>\n\n\n\n<h2>Extending the Pythagorean theorem to 3-dimension<\/h2>\n\n\n\n<p>Now, let&#8217;s extend this idea into three dimensions, where distances involve not only length and width but also height.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img width=\"622\" height=\"437\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/image-11.png\" alt=\"Screenshot%202023-11-08%20at%2012.01.24%20PM.png\" class=\"wp-image-4339\"\/><\/figure>\n\n\n\n<p>In the above figure, distance &#8216;e&#8217; would be the distance between point 1 &amp; point 2. We could determine it using Pythagorean theorem as seen previously, but we first need to find the value of &#8216;d&#8217; using values &#8216;a&#8217; and &#8216;b&#8217;.<\/p>\n\n\n\n<p>Using Pythagorean theorem, we know that for triangle ABD, <\/p>\n\n\n\n<pre id=\"block-c5fe8ca1-9e7e-42c4-92fc-76f9ad47023d\" class=\"wp-block-preformatted\"><span class=\"has-inline-color has-medium-gray-color\">d<sup>2<\/sup> = a<sup>2<\/sup> + b<sup>2<\/sup><\/span><\/pre>\n\n\n\n<p>And for triangle ECD, <\/p>\n\n\n\n<pre id=\"block-c5fe8ca1-9e7e-42c4-92fc-76f9ad47023d\" class=\"wp-block-preformatted\"><span class=\"has-inline-color has-medium-gray-color\">e<sup>2<\/sup> = d<sup>2<\/sup> + c<sup>2<\/sup><\/span><\/pre>\n\n\n\n<p>If we substitute&nbsp;the value of d<sup>2<\/sup> in this equation, it becomes:<\/p>\n\n\n\n<pre id=\"block-c5fe8ca1-9e7e-42c4-92fc-76f9ad47023d\" class=\"wp-block-preformatted\"><span class=\"has-inline-color has-medium-gray-color\">e<sup>2<\/sup> = a<sup>2<\/sup> + b<sup>2<\/sup> + c<sup>2<\/sup><\/span><\/pre>\n\n\n<p>e = \\(\\sqrt{a^2+b^2 + c^2}\\)<\/p>\n\n\n\n<p>So, now we can use this formula to calculate the length of &#8216;e&#8217;. Let&#8217;s do it.<\/p>\n\n\n\n<h5>Step 1: Define two Points in 3D<\/h5>\n\n\n\n<pre class=\"wp-block-preformatted\">point1 <strong>=<\/strong> (1, 2, 3)\npoint2 <strong>=<\/strong> (4, 6, 8)<\/pre>\n\n\n\n<h5>Step 2: Calculate the differences in coordinates<\/h5>\n\n\n\n<pre class=\"wp-block-preformatted\">delta_x <strong>=<\/strong> point2[0] <strong>-<\/strong> point1[0]\ndelta_y <strong>=<\/strong> point2[1] <strong>-<\/strong> point1[1]\ndelta_z <strong>=<\/strong> point2[2] <strong>-<\/strong> point1[2]\n\ndistance_3d <strong>=<\/strong> (delta_x<strong>**<\/strong>2 <strong>+<\/strong> delta_y<strong>**<\/strong>2 <strong>+<\/strong> delta_z<strong>**<\/strong>2)<strong>**<\/strong>0.5\nprint(\"Distance between 3D points:\", distance_3d)<\/pre>\n\n\n\n<p>The output for the above code is:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"has-inline-color has-medium-gray-color\">Distance between 3D points: 7.0710678118654755 <\/span><\/pre>\n\n\n\n<p>So we are able to calculate distance between two points in 3 dimension. Again, we can represent the distance as:<\/p>\n\n\n<p>point1 = (1, 2, 3)<br \/>\npoint2 = (4, 6, 8)<br \/>\ndistance = hypotenuse<br \/>\ndistance = \\(\\sqrt{hypotenuse^2}\\)<br \/>\ndistance = \\(\\sqrt{\\Delta x^2 + \\Delta y^2 + \\Delta z^2}\\)<br \/>\ndistance = \\(\\sqrt{(point2[0] &#8211; point1[0])^2 + (point2[1] &#8211; point1[1])^2 + (point2[2] &#8211; point1[2])^2}\\)<\/p>\n\n\n\n<p>In code, it looks like:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">distance_3d <strong>=<\/strong> ((point2[0] <strong>-<\/strong> point1[0])<strong>**<\/strong>2 <strong>+<\/strong> (point2[1] <strong>-<\/strong> point1[1])<strong>**<\/strong>2 <strong>+<\/strong> (point2[2] <strong>-<\/strong> point1[2])<strong>**<\/strong>2)<strong>**<\/strong>0.5\n\nprint(\"Distance between 3D points:\", distance_3d)<\/pre>\n\n\n\n<p>The output for the above code is:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"has-inline-color has-medium-gray-color\">Distance between 3D points: 7.0710678118654755 <\/span><\/pre>\n\n\n\n<p>So euclidean distance for 3-D becomes as:<\/p>\n\n\n<p>distance = \\(\\sqrt{(x_2 \u2013 x_1)^2 + (y_2 \u2013 y_1)^2 + (z_2 \u2013 z_1)^2}\\)<\/p>\n\n\n\n<h2>Extending the Pythagorean theorem to n-dimension<\/h2>\n\n\n\n<pre class=\"wp-block-preformatted\"><em># Calculate the Euclidean distance<\/em>\n<strong>def<\/strong> calculate_distance(point1, point2):\n    distance_nd <strong>=<\/strong> sum((x <strong>-<\/strong> y) <strong>**<\/strong> 2 <strong>for<\/strong> x, y <strong>in<\/strong> zip(point2, point1))<strong>**<\/strong>0.5\n    <strong>return<\/strong> distance_nd<\/pre>\n\n\n\n<p>Let&#8217;s create two coordinates in 6 dimension.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">point1 <strong>=<\/strong> (1, 2, 3, 4, 7, 4)\npoint2 <strong>=<\/strong> (4, 6, 8, 10, 8, 6)\ncalculate_distance(point1, point2)<\/pre>\n\n\n\n<p>The output of the above code is:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">9.539392014169456<\/pre>\n\n\n\n<p>So, Euclidean distance for n-d becomes as<\/p>\n\n\n<p>distance = \\(\\sqrt{(x_2 \u2013 x_1)^2 + (y_2 \u2013 y_1)^2 + (z_2 \u2013 z_1)^2 + (a_2 \u2013 a_1)^2 + &#8230;&#8230;&#8230;&#8230;}\\)<\/p>\n\n\n\n<p>where x, y, z, a &#8230;.., each represent a dimension\/axis.<\/p>\n\n\n\n<p>Or we can say that, for two points p and q in n-dimension,<\/p>\n\n\n<p>d(p, q) = \\(\\sqrt{(q_1 \u2013 p_1)^2 + (q_2 \u2013 p_2)^2 + (q_3 \u2013 p_3)^2 + (q_4 \u2013 p_4)^2 + &#8230;&#8230;&#8230;&#8230;}\\)<\/p>\n\n\n\n<p>which comes as,<\/p>\n\n\n<p>distance = \\(\\sqrt{\\sum_{i=1}^{n} (q_i &#8211; p_i)^2 }\\)<\/p>\n\n\n\n<p>where &#8216;<strong>i&#8217;<\/strong> represents a dimension and &#8216;<strong>n&#8217;<\/strong> total number of dimensions.<\/p>\n\n\n\n<p>Now, we can use this formula to calculate the distance between two embeddings, as they are N-dimensional vectors. As we know, the closer two embeddings are, the more similar they are.<\/p>\n\n\n\n<h2>Calculating distance between two embeddings<\/h2>\n\n\n\n<p>First, let&#8217;s compute the embeddings of four words.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">cat <strong>=<\/strong> get_openai_embedding('cat')\ndog <strong>=<\/strong> get_openai_embedding('dog')\ncar <strong>=<\/strong> get_openai_embedding('car')\nbike <strong>=<\/strong> get_openai_embedding('bike')<\/pre>\n\n\n\n<p>Let&#8217;s use the <code>calculate_distance()<\/code> function to compute distances between various word embeddings:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">distance_cat_dog <strong>=<\/strong> calculate_distance(cat, dog)\ndistance_cat_car <strong>=<\/strong> calculate_distance(cat, car)\ndistance_cat_bike <strong>=<\/strong> calculate_distance(cat, bike)\ndistance_bike_car <strong>=<\/strong> calculate_distance(bike, car)\ndistance_bike_dog <strong>=<\/strong> calculate_distance(bike, dog)<\/pre>\n\n\n\n<p>Now, let&#8217;s see how the distance looks like between these embeddings:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">print(f\"Distance between 'cat' and 'dog': {distance_cat_dog:.2f}\")\nprint(f\"Distance between 'cat' and 'car': {distance_cat_car:.2f}\")\nprint(f\"Distance between 'cat' and 'bike': {distance_cat_bike:.2f}\")\nprint(f\"Distance between 'bike' and 'car': {distance_bike_car:.2f}\")\nprint(f\"Distance between 'bike' and 'dog': {distance_bike_dog:.2f}\")<\/pre>\n\n\n\n<p>The output of the above code is:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"has-inline-color has-medium-gray-color\">Distance between 'cat' and 'dog': 0.52\nDistance between 'cat' and 'car': 0.56\nDistance between 'cat' and 'bike': 0.61\nDistance between 'bike' and 'car': 0.54\nDistance between 'bike' and 'dog': 0.57<\/span><\/pre>\n\n\n\n<p>We can see, that &#8216;cat&#8217; is closer to &#8216;dog&#8217; than to &#8216;car&#8217; or &#8216;bike&#8217;. Also, &#8216;bike&#8217; is closer to &#8216;car&#8217; than to &#8216;dog&#8217; or &#8216;cat&#8217;. That means, the function is working fine.<\/p>\n\n\n\n<h2>Sentiment Analysis<\/h2>\n\n\n\n<p>In sentiment analysis, we want to understand whether a text expresses positive or negative feelings. Here&#8217;s a simple way to do it using word embeddings and basic geometry:<\/p>\n\n\n\n<h5>What We Do:<\/h5>\n\n\n\n<ol><li><strong>Choose Sentiment Words<\/strong>: We pick specific words that clearly indicate positive or negative sentiments, like &#8216;positive&#8217; and &#8216;negative&#8217;. For each of these words, we get their embeddings.<ul><li><strong>Positive Sentiment<\/strong>: We find the embedding for the word &#8216;positive&#8217; (<code>embd_positive<\/code>).<\/li><li><strong>Negative Sentiment<\/strong>: Similarly, we find the embedding for &#8216;negative&#8217; (<code>embd_neg<\/code>).<\/li><\/ul><\/li><li><strong>Calculate Distance<\/strong>: Now, when we have a review we want to analyze, we also get its embedding. We then calculate the distance between the review&#8217;s embedding and the embeddings of our sentiment words (<code>embd_positive<\/code> and <code>embd_neg<\/code>).<\/li><li><strong>Make a Decision<\/strong>: Based on these distances:<ul><li>If the review&#8217;s embedding is closer to <code>embd_positive<\/code> than to <code>embd_neg<\/code>, we say it&#8217;s a positive review.<\/li><li>Otherwise, if it&#8217;s closer to <code>embd_neg<\/code>, we say it&#8217;s negative.<\/li><\/ul><\/li><\/ol>\n\n\n\n<p>Seems confusing? Let&#8217;s understand it with hands-on.<\/p>\n\n\n\n<p>First, we start by obtaining embeddings for key sentiment words\u2014&#8217;positive&#8217; and &#8216;negative&#8217;:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">embd_positive <strong>=<\/strong> get_openai_embedding('positive')\nembd_neg <strong>=<\/strong> get_openai_embedding('negative')<\/pre>\n\n\n\n<p>Next, we calculate embeddings for two straightforward reviews:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">embeddings_good_review <strong>=<\/strong> get_openai_embedding('The product is amazing')\nembeddings_bad_review <strong>=<\/strong> get_openai_embedding('The product is not good')<\/pre>\n\n\n\n<p>Now, let&#8217;s use <code>visualize_pca_2d()<\/code> from before to visualize them.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">sent_embeddings <strong>=<\/strong> np<strong>.<\/strong>array([embeddings_good_review, embeddings_bad_review, embd_positive, embd_neg])\nvisualize_pca_2d(sent_embeddings, words <strong>=<\/strong> ['The product is amazing', 'The product is not good','positive', 'negative'])<\/pre>\n\n\n\n<p>The above code outputs:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img width=\"709\" height=\"387\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/image-12.png\" alt=\"\" class=\"wp-image-4345\"\/><\/figure>\n\n\n\n<p>We can clearly see that embedding of &#8216;The product is amazing&#8217; is closer to embedding of word &#8216;positive&#8217; than the embedding of word &#8220;negative&#8221;. This shows that our approach works.<\/p>\n\n\n\n<p>Now, armed with these embeddings, we can create a simple function <code>sentiment(review)<\/code> to classify reviews as positive or negative based on their proximity to our sentiment embeddings:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>def<\/strong> sentiment(review):\n    embed_review <strong>=<\/strong> get_openai_embedding(review)\n    dist_pos <strong>=<\/strong> calculate_distance(embed_review, embd_positive)\n    dist_neg <strong>=<\/strong> calculate_distance(embed_review, embd_neg)\n    <strong>if<\/strong> dist_pos <strong>&lt;<\/strong> dist_neg:\n        print(\"It is a positive review\")\n        <strong>return<\/strong> <strong>True<\/strong>\n    <strong>else<\/strong>:\n        print(\"It is a negative review\")\n        <strong>return<\/strong> <strong>False<\/strong><\/pre>\n\n\n\n<p>Let&#8217;s put our <code>sentiment()<\/code> function to the test with more complex reviews:<\/p>\n\n\n\n<pre id=\"block-8f8093e2-e3a8-4271-91e8-04a8605e7d9f\" class=\"wp-block-preformatted\">sentiment(\"Camera quality is too worst. Don't buy this if you want good photos. We can get this quality pictures with 5000 rupees android phone. I am totally disappointed because I expected range of iPhone camera quality but not this.. Waste of money.\")<\/pre>\n\n\n\n<p>The output of the above code is: <\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"has-inline-color has-medium-gray-color\">It is a negative review<\/span><\/pre>\n\n\n\n<pre id=\"block-8f8093e2-e3a8-4271-91e8-04a8605e7d9f\" class=\"wp-block-preformatted\">sentiment(\"\"Worth to buy it. If you are managed money buy then buy it it never feels you waste of money. Performance. Hand in feel. Camera quality at flagship level\"\")<\/pre>\n\n\n\n<p>The output of the above code is: <\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"has-inline-color has-medium-gray-color\">It is a positive review <\/span><\/pre>\n\n\n\n<p>Let&#8217;s try out more tricky reviews.<\/p>\n\n\n\n<pre id=\"block-8f8093e2-e3a8-4271-91e8-04a8605e7d9f\" class=\"wp-block-preformatted\">sentiment(\"At first it seemed like a great product but my expectations were changed completed.\")<\/pre>\n\n\n\n<p>The output of the above code is: <\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"has-inline-color has-medium-gray-color\">It is a negative review<\/span><\/pre>\n\n\n\n<pre id=\"block-8f8093e2-e3a8-4271-91e8-04a8605e7d9f\" class=\"wp-block-preformatted\">sentiment(\"At first, it seemed like a bad product but it met my expectations.\")<\/pre>\n\n\n\n<p>The output of the above code is: <\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><span class=\"has-inline-color has-medium-gray-color\">It is a positive review<\/span><\/pre>\n\n\n\n<p>Our <code>sentiment()<\/code> function demonstrates a straightforward yet effective approach to sentiment analysis using word embeddings. By leveraging pre-trained embeddings and simple distance calculations, we can accurately classify reviews based on their underlying sentiment. Feel free to experiment with different reviews and observe how the function performs in classifying sentiments!<\/p>\n\n\n\n<h2>Dot product between two vectors<\/h2>\n\n\n\n<p>Till now, we&#8217;ve been using Euclidean distance to measure similarity between vectors. However, another powerful method for gauging similarity is through the dot product.<\/p>\n\n\n\n<p>The dot product, also known as the scalar product, is a way to measure how much two vectors are aligned or point in the same direction. Imagine vectors as arrows in space, and the dot product tells us how much one arrow overlaps with another.<\/p>\n\n\n\n<p>Mathematically, the dot product of two vectors A and B is represented as A \u00b7 B and is calculated as follows:<\/p>\n\n\n\n<p>A = (A1, A2, A3, &#8230;&#8230;, An)<\/p>\n\n\n\n<p>B = (B1, B2, B3, &#8230;&#8230;, Bn<\/p>\n\n\n\n<p>A \u00b7 B = A1 * B1 + A2 * B2 + A3 * B3 + &#8230; + An * Bn<\/p>\n\n\n\n<p>So suppose two vectors are<\/p>\n\n\n\n<p>A = (1,2,3)<\/p>\n\n\n\n<p>B = (4,5,6)<\/p>\n\n\n\n<p>A.B = (A[0] * B[0]) + (A[1] * B[1]) + (A[2] * B[2])<\/p>\n\n\n\n<p>A.B = (1 * 4) + (2 * 5) + (3 * 6)<\/p>\n\n\n\n<p>A.B = 32<\/p>\n\n\n\n<h5 id=\"What-the-Dot-Product-Represents?\">What the Dot Product Represents?<\/h5>\n\n\n\n<p>The dot product gives us a number that represents how similar or aligned two vectors are. If the dot product is large, it means the vectors are pointing in similar directions. If it&#8217;s small, the vectors are more perpendicular or point in different directions.<\/p>\n\n\n\n<p>Let&#8217;s understand it with the help of visualization.<\/p>\n\n\n\n<p>Consider two vectors as:<\/p>\n\n\n\n<p>A = (2,3)<\/p>\n\n\n\n<p>B = (4,2)<\/p>\n\n\n\n<p>So, if we plot them, they will look like:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img width=\"379\" height=\"387\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/image-14.png\" alt=\"download.png\" class=\"wp-image-4350\"\/><\/figure>\n\n\n\n<p>Imagine walking from the tail of vector A (2, 3) in the direction of vector B (4, 2) until you hit the line containing vector B. This imaginary path is the projection of A onto B (represented by dotted red line). The length of this projection (which is a dotted line segment in the image) is what matters for the dot product calculation.<\/p>\n\n\n\n<p>So, A.B will look like:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/image-15.png\" alt=\"download%20copy.png\"\/><\/figure>\n\n\n\n<h3>Applying Dot Product to Sentiment Analysis<\/h3>\n\n\n\n<p>Now, let&#8217;s leverage the dot product for sentiment analysis. Everything remains same. Just instead of using Euclidean Distance, we will be using dot product.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>def<\/strong> sentiment_using_dot_product(review):\n    embed_review <strong>=<\/strong> get_openai_embedding(review)\n    dist_pos <strong>=<\/strong> np<strong>.<\/strong>dot(embed_review, embd_positive)\n    dist_neg <strong>=<\/strong> np<strong>.<\/strong>dot(embed_review, embd_neg)\n    <strong>if<\/strong> dist_pos <strong>&gt;<\/strong> dist_neg:\n        print(\"It is a positive review\")\n        <strong>return<\/strong> <strong>True<\/strong>\n    <strong>else<\/strong>:\n        print(\"It is a negative review\")\n        <strong>return<\/strong> <strong>False<\/strong><\/pre>\n\n\n\n<p>Let&#8217;s try in Hindi language to see if it works.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">is_positive2(\"ye product kaafi achha hai\")\n<\/pre>\n\n\n\n<p>The output is:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">It is a positive review<\/pre>\n\n\n\n<p>Let&#8217;s try for a negative review.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">is_positive2(\"ye product kaafi kharaab hai\")\n<\/pre>\n\n\n\n<p>The output is:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">It is a negative review<\/pre>\n\n\n\n<p>So, that&#8217;s the benefit of using LLM Embeddings. It can understand sentiments in different languages. <\/p>\n\n\n\n<h2>Cosine Similarity<\/h2>\n\n\n\n<p>The dot product of two vectors measures their similarity in terms of both direction and magnitude. However, the dot product itself lacks a bounded range, making it less intuitive as a direct similarity metric. To address this limitation, we can use cosine similarity, which is a normalized version of the dot product.<\/p>\n\n\n\n<p>Now let&#8217;s consider the plot from before, which is<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img width=\"379\" height=\"387\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/image-14.png\" alt=\"download.png\" class=\"wp-image-4350\"\/><\/figure>\n\n\n\n<p>In trigonometry, we have studied that: <code>cos(\u03b8) = adjacent side \/ hypotenuse<\/code><\/p>\n\n\n\n<p>In the context of the dot product, the adjacent side is (A \u00b7 B), and the hypotenuse is (|A| * |B|). Therefore, we have:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img width=\"379\" height=\"387\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/image-15.png\" alt=\"download%20copy.png\" class=\"wp-image-4351\"\/><\/figure>\n\n\n\n<p>That makes cos \u03b8 = (A \u00b7 B) \/ (|A| * |B|)<\/p>\n\n\n\n<p>This is called as <strong>cosine similarity<\/strong>, another similarity measure for vectors. So, here we basically normalize the dot product by dividing it by the product of the magnitudes (or norms) of the vectors. The resulting similarity score ranges between -1 (perfectly opposite directions) and 1 (perfectly aligned directions). That makes cosine similarity more interpretable and suitable for various similarity measurement tasks.<\/p>\n\n\n\n<p>If the vectors are exactly the same, the cosine similarity is 1 (cos 0\u00b0 = 1).<\/p>\n\n\n\n<p>If the vectors are orthogonal (perpendicular), the cosine similarity is 0 (cos 90\u00b0 = 0).<\/p>\n\n\n\n<p>If the vectors are in opposite directions, the cosine similarity is -1 (cos 180\u00b0 = -1).<\/p>\n\n\n\n<p>Values in between indicate different degrees of similarity.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img width=\"824\" height=\"339\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/image-16.png\" alt=\"image.png\" class=\"wp-image-4352\"\/><\/figure>\n\n\n\n<p>Let&#8217;s code it:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>def<\/strong> cosine_similarity(A, B):\n    dot_product <strong>=<\/strong> np<strong>.<\/strong>dot(A, B)\n\n    <em># Calculate the magnitudes |A| and |B|<\/em>\n    magnitude_A <strong>=<\/strong> np<strong>.<\/strong>linalg<strong>.<\/strong>norm(A)\n    magnitude_B <strong>=<\/strong> np<strong>.<\/strong>linalg<strong>.<\/strong>norm(B)\n\n    <em># Calculate the cosine similarity<\/em>\n    cosine_similarity <strong>=<\/strong> dot_product <strong>\/<\/strong> (magnitude_A <strong>*<\/strong> magnitude_B)\n    <strong>return<\/strong> cosine_similarity<\/pre>\n\n\n\n<p>Now it&#8217;s homework for you to perform sentiment analysis using cosine similarity as the similarity measure.<\/p>\n\n\n\n<h2>Conclusion<\/h2>\n\n\n\n<p>Embarking on this journey into sentiment analysis with word embeddings has not only provided practical insights but has also highlighted the elegance and simplicity of leveraging embeddings for natural language understanding.<\/p>\n\n\n\n<p>As you continue to explore the world of embeddings and NLP, remember that the applications extend far beyond sentiment analysis. From machine translation to text summarization and beyond, embeddings serve as foundational tools that underpin many advanced techniques in the field of natural language processing.<\/p>\n\n\n\n<p>We encourage you to experiment further, refine your understanding, and delve deeper into the possibilities that embeddings offer. Whether you&#8217;re a seasoned practitioner or just beginning your NLP journey, the realm of word embeddings invites exploration and innovation.<\/p>\n\n\n\n<p><strong>Ready to take a deep dive into generative AI?<\/strong>&nbsp;Consider enrolling in our course,&nbsp;<a href=\"https:\/\/cloudxlab.com\/course\/204\/hands-on-generative-ai-with-langchain-and-python\">Hands-on Generative AI with Langchain and Python<\/a>&nbsp;on CloudxLab. This course will equip you with the skills to build powerful generative models using Python and Langchain!<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/cloudxlab.com\/course\/204\/hands-on-generative-ai-with-langchain-and-python\"><img width=\"1189\" height=\"422\" src=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/03\/Screenshot-2024-02-29-at-1.55.59-PM.png\" alt=\"\" class=\"wp-image-4226\"\/><\/a><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Imagine you&#8217;re browsing online and companies keep prompting you to rate and review your experiences. Have you ever wondered how these companies manage to process and make sense of the deluge of feedback they receive? Don&#8217;t worry! They don&#8217;t do it manually. This is where sentiment analysis steps in\u2014a technology that analyzes text to understand &hellip; <a href=\"https:\/\/cloudxlab.com\/blog\/understanding-embeddings-and-matrices-with-the-help-of-sentiment-analysis-and-llms-hands-on\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Understanding Embeddings and Matrices with the help of Sentiment Analysis and LLMs (Hands-On)&#8221;<\/span><\/a><\/p>\n","protected":false},"author":36,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v16.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Understanding Embeddings and Matrices with the help of Sentiment Analysis and LLMs (Hands-On) | CloudxLab Blog<\/title>\n<meta name=\"description\" content=\"We&#039;ll explore how words can be visualized and transformed into points in space, where similar words cluster together, and we&#039;ll unveil the secret sauce of how to measure the &quot;distance&quot; and &quot;similarity&quot; between these word-points.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/cloudxlab.com\/blog\/understanding-embeddings-and-matrices-with-the-help-of-sentiment-analysis-and-llms-hands-on\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Understanding Embeddings and Matrices with the help of Sentiment Analysis and LLMs (Hands-On) | CloudxLab Blog\" \/>\n<meta property=\"og:description\" content=\"We&#039;ll explore how words can be visualized and transformed into points in space, where similar words cluster together, and we&#039;ll unveil the secret sauce of how to measure the &quot;distance&quot; and &quot;similarity&quot; between these word-points.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/cloudxlab.com\/blog\/understanding-embeddings-and-matrices-with-the-help-of-sentiment-analysis-and-llms-hands-on\/\" \/>\n<meta property=\"og:site_name\" content=\"CloudxLab Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cloudxlab\" \/>\n<meta property=\"article:published_time\" content=\"2024-05-07T10:19:37+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-05-09T07:02:39+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/img-9HiLwYctaZhJCf1JjleJlHTl.png\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@CloudxLab\" \/>\n<meta name=\"twitter:site\" content=\"@CloudxLab\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\">\n\t<meta name=\"twitter:data1\" content=\"25 minutes\">\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebSite\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/#website\",\"url\":\"https:\/\/cloudxlab.com\/blog\/\",\"name\":\"CloudxLab Blog\",\"description\":\"Learn AI, Machine Learning, Deep Learning, Devops &amp; Big Data\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":\"https:\/\/cloudxlab.com\/blog\/?s={search_term_string}\",\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/understanding-embeddings-and-matrices-with-the-help-of-sentiment-analysis-and-llms-hands-on\/#primaryimage\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/img-9HiLwYctaZhJCf1JjleJlHTl.png\",\"contentUrl\":\"https:\/\/blog.cloudxlab.com\/wp-content\/uploads\/2024\/05\/img-9HiLwYctaZhJCf1JjleJlHTl.png\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/understanding-embeddings-and-matrices-with-the-help-of-sentiment-analysis-and-llms-hands-on\/#webpage\",\"url\":\"https:\/\/cloudxlab.com\/blog\/understanding-embeddings-and-matrices-with-the-help-of-sentiment-analysis-and-llms-hands-on\/\",\"name\":\"Understanding Embeddings and Matrices with the help of Sentiment Analysis and LLMs (Hands-On) | CloudxLab Blog\",\"isPartOf\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/understanding-embeddings-and-matrices-with-the-help-of-sentiment-analysis-and-llms-hands-on\/#primaryimage\"},\"datePublished\":\"2024-05-07T10:19:37+00:00\",\"dateModified\":\"2024-05-09T07:02:39+00:00\",\"author\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/#\/schema\/person\/4438d405318314ec50940bde93ef548a\"},\"description\":\"We'll explore how words can be visualized and transformed into points in space, where similar words cluster together, and we'll unveil the secret sauce of how to measure the \\\"distance\\\" and \\\"similarity\\\" between these word-points.\",\"breadcrumb\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/understanding-embeddings-and-matrices-with-the-help-of-sentiment-analysis-and-llms-hands-on\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/cloudxlab.com\/blog\/understanding-embeddings-and-matrices-with-the-help-of-sentiment-analysis-and-llms-hands-on\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/understanding-embeddings-and-matrices-with-the-help-of-sentiment-analysis-and-llms-hands-on\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"item\":{\"@type\":\"WebPage\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/\",\"url\":\"https:\/\/cloudxlab.com\/blog\/\",\"name\":\"Home\"}},{\"@type\":\"ListItem\",\"position\":2,\"item\":{\"@id\":\"https:\/\/cloudxlab.com\/blog\/understanding-embeddings-and-matrices-with-the-help-of-sentiment-analysis-and-llms-hands-on\/#webpage\"}}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/#\/schema\/person\/4438d405318314ec50940bde93ef548a\",\"name\":\"Shubh Tripathi\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/cloudxlab.com\/blog\/#personlogo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/76bb13891affbf9da48fa9701d774ff0?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/76bb13891affbf9da48fa9701d774ff0?s=96&d=mm&r=g\",\"caption\":\"Shubh Tripathi\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","_links":{"self":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts\/4286"}],"collection":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/users\/36"}],"replies":[{"embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/comments?post=4286"}],"version-history":[{"count":43,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts\/4286\/revisions"}],"predecessor-version":[{"id":4357,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/posts\/4286\/revisions\/4357"}],"wp:attachment":[{"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/media?parent=4286"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/categories?post=4286"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cloudxlab.com\/blog\/wp-json\/wp\/v2\/tags?post=4286"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}