Enrollments closing soon for Post Graduate Certificate Program in Applied Data Science & AI By IIT Roorkee | 3 Seats Left
Apply NowLogin using Social Account
     Continue with GoogleLogin using your credentials
We'll use the MarkdownTextSplitter
from Langchain to chop the text into smaller segments. Each segment will then be transformed into a "document" object, essentially a knowledge nugget for your chatbot.
Text splitters have two main parameters:
The chunk size and chunk overlap parameters can be used to control the granularity of the text splitting. A smaller chunk size will result in more chunks, while a larger chunk size will result in fewer chunks. A larger chunk overlap will result in more chunks sharing common characters, while a smaller chunk overlap will result in fewer chunks sharing common characters.
Importing Libraries:
from langchain.text_splitter import MarkdownTextSplitter
from langchain.docstore.document import Document
Define a function text_to_docs
that takes text
and metadata
as input and returns a list of Documents:
def text_to_docs(text, metadata):
doc_chunks = []
text_splitter = MarkdownTextSplitter(chunk_size=2048, chunk_overlap=128)
chunks = text_splitter.split_text(text)
for i, chunk in enumerate(chunks):
doc = Document(page_content=chunk, metadata=metadata)
doc_chunks.append(doc)
return doc_chunks
Taking you to the next exercise in seconds...
Want to create exercises like this yourself? Click here.
No hints are availble for this assesment
Note - Having trouble with the assessment engine? Follow the steps listed here
Loading comments...