DynamoDB BatchWriteItem: Provided list of item keys contains duplicates

You've provided two or more items with identical partition/sort keys.

Per the BatchWriteItem docs, you cannot perform multiple operations on the same item in the same BatchWriteItem request.

Consideration: This answers works for Python

As @Benoit has remarked, the boto3 documentation states:

If you want to bypass no duplication limitation of single batch write request as botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the BatchWriteItem operation: Provided list of item keys contains duplicates.

you could specify overwrite_by_pkeys=['partition_key', 'sort_key'] on the batch writer to "de-duplicate request items in buffer if match new request item on specified primary keys" according to the documentation and the source code. That is, if the combination primary-sort already exists in the buffer it will drop that request and replace it with the new one.

Example

Suppose there is pandas dataframe that you want to write to a DynamoDB table, the following function could be helpful,

import json 
import datetime as dt 
import boto3 
import pandas as pd 
from typing import Optional

def write_dynamoDB(df:'pandas.core.frame.DataFrame', tbl:str, partition_key:Optional[str]=None, sort_key:Optional[str]=None):
    '''
       Function to write a pandas DataFrame to a DynamoDB Table through 
       batchWrite operation. In case there are any float values it handles 
       them by converting the data to a json format.  
       Arguments: 
       * df: pandas DataFrame to write to DynamoDB table. 
       * tbl: DynamoDB table name. 
       * partition_key (Optional): DynamoDB table partition key. 
       * sort_key (Optional): DynamoDB table sort key. 
    '''

    # Initialize AWS Resource 
    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table(tbl)

    # Check if overwrite keys were provided
    overwrite_keys = [partition_key, sort_key] if partition_key else None

    # Check if they are floats (convert to decimals instead) 
    if any([True for v in df.dtypes.values if v=='float64']):

        from decimal import Decimal

        # Save decimals with JSON
        df_json = json.loads(
                       json.dumps(df.to_dict(orient='records'),
                                  default=date_converter,
                                  allow_nan=True), 
                       parse_float=Decimal
                       )

        # Batch write 
        with table.batch_writer(overwrite_by_pkeys=overwrite_keys) as batch: 
            for element in df_json:
                batch.put_item(
                Item=element
            )

    else: # If there are no floats on data 

    # Batch writing 
        with table.batch_writer(overwrite_by_pkeys=overwrite_keys) as batch: 

            columns = df.columns

            for row in df.itertuples():
                batch.put_item(
                    Item={
                        col:row[idx+1] for idx,col in enumerate(columns)
                    }
                )

def date_converter(obj):
    if isinstance(obj, dt.datetime):
        return obj.__str__()
    elif isinstance(obj, dt.date):
        return obj.isoformat()

by calling write_dynamoDB(dataframe, 'my_table', 'the_partition_key', 'the_sort_key').

DynamoDB BatchWriteItem: Provided list of item keys contains duplicates

Example

Tags:

Amazon Web Services

Amazon Dynamodb

Aws Lambda

Related

Recent Posts