Pinecone Actor
This Actor is paid per event
This Actor may be unreliable while under maintenance. Would you like to try a similar Actor instead?
See alternative ActorsActor Metrics
0 monthly users
-
1 bookmark
Created in Dec 2024
Modified 18 days ago
Pinecone Integration
Integrate Apify Actors with Pinecone to seamlessly transfer and store data as vectors.
⚠️ Important: This repository is deprecated - code is actively maintained in the Apify vector database integrations repository
Explore how to utilize vector stores on the Apify platform by reading our blog post: Understanding Pinecone and Its Importance for Your LLMs.
Description
This integration is designed to process and store data vectors from various Apify Actors. It interfaces with OpenAI
and Pinecone
through langchain
to perform the following steps:
- Retrieve Actor's dataset using
dataset_id
(automatically passed in integration). - Fetch the dataset using the
Apify SDK
. - [Optional] Segment text data into chunks with
langchain
'sRecursiveCharacterTextSplitter
(parameters likechunk_size
andchunk_overlap
are customizable). - Compute embeddings via
OpenAI
. - Store the resulting vectors in
Pinecone
.
Before You Start
Ensure you have the following prerequisites for this integration:
- An OpenAI account and API token. Sign up for a free account at OpenAI.
- A Pinecone database with a valid API KEY (
pinecone_token
).
Inputs
Refer to the .actor/input_schema.json for detailed information:
index_name
: Name of the Pinecone index.pinecone_token
: Your Pinecone access token (API KEY).openai_token
: Your OpenAI API token.fields
- Array of fields you want to save. For example, if you want to pushname
anduser.description
fields, you should set this field to["name", "user.description"]
.metadata_values
- Object of metadata values you want to save. For example, if you want to pushurl
andcreatedAt
values to Pinecone, you should set this field to{"url": "https://www.apify.com", "createdAt": "2021-09-01"}
.metadata_fields
- Object of metadata fields you want to save. For example, if you want to pushurl
andcreatedAt
fields, you should set this field to{"url": "url", "createdAt": "createdAt"}
. If it has the same key asmetadata_values
, it's replaced.chunk_size
: Maximum character length for each text chunk.chunk_overlap
: Overlap in characters between consecutive text chunks.
Fields, metadata_values
, and metadata_fields
support dot notation for nested data.
Outputs
This integration saves selected fields from your Actor's output into your Pinecone database.
Community and Support
- Join our developer community on Discord to connect with other developers and discuss integrations.
- Visit Apify for data needs of your LLMs for tools to ingest comprehensive datasets from various sources, enriching your large language models.