DynamoDB BatchWriteItem: Provided list of item keys contains duplicates
You've provided two or more items with identical partition/sort keys.
Per the BatchWriteItem docs, you cannot perform multiple operations on the same item in the same BatchWriteItem request.
Consideration: This answers works for Python
As @Benoit has remarked, the boto3 documentation states:
If you want to bypass no duplication limitation of single batch write request as botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the BatchWriteItem operation: Provided list of item keys contains duplicates.
you could specify overwrite_by_pkeys=['partition_key', 'sort_key']
on the batch writer to "de-duplicate request items in buffer if match new request item on specified primary keys" according to the documentation and the source code. That is, if the combination primary-sort already exists in the buffer it will drop that request and replace it with the new one.
Example
Suppose there is pandas dataframe that you want to write to a DynamoDB table, the following function could be helpful,
import json
import datetime as dt
import boto3
import pandas as pd
from typing import Optional
def write_dynamoDB(df:'pandas.core.frame.DataFrame', tbl:str, partition_key:Optional[str]=None, sort_key:Optional[str]=None):
'''
Function to write a pandas DataFrame to a DynamoDB Table through
batchWrite operation. In case there are any float values it handles
them by converting the data to a json format.
Arguments:
* df: pandas DataFrame to write to DynamoDB table.
* tbl: DynamoDB table name.
* partition_key (Optional): DynamoDB table partition key.
* sort_key (Optional): DynamoDB table sort key.
'''
# Initialize AWS Resource
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(tbl)
# Check if overwrite keys were provided
overwrite_keys = [partition_key, sort_key] if partition_key else None
# Check if they are floats (convert to decimals instead)
if any([True for v in df.dtypes.values if v=='float64']):
from decimal import Decimal
# Save decimals with JSON
df_json = json.loads(
json.dumps(df.to_dict(orient='records'),
default=date_converter,
allow_nan=True),
parse_float=Decimal
)
# Batch write
with table.batch_writer(overwrite_by_pkeys=overwrite_keys) as batch:
for element in df_json:
batch.put_item(
Item=element
)
else: # If there are no floats on data
# Batch writing
with table.batch_writer(overwrite_by_pkeys=overwrite_keys) as batch:
columns = df.columns
for row in df.itertuples():
batch.put_item(
Item={
col:row[idx+1] for idx,col in enumerate(columns)
}
)
def date_converter(obj):
if isinstance(obj, dt.datetime):
return obj.__str__()
elif isinstance(obj, dt.date):
return obj.isoformat()
by calling write_dynamoDB(dataframe, 'my_table', 'the_partition_key', 'the_sort_key')
.