How to use python-docx to replace text in a Word document and save
UPDATE: There are a couple of paragraph-level functions that do a good job of this and can be found on the GitHub site for python-docx
.
- This one will replace a regex-match with a replacement str. The replacement string will appear formatted the same as the first character of the matched string.
- This one will isolate a run such that some formatting can be applied to that word or phrase, like highlighting each occurence of "foobar" in the text or perhaps making it bold or appear in a larger font.
The current version of python-docx does not have a search()
function or a replace()
function. These are requested fairly frequently, but an implementation for the general case is quite tricky and it hasn't risen to the top of the backlog yet.
Several folks have had success though, getting done what they need, using the facilities already present. Here's an example. It has nothing to do with sections by the way :)
for paragraph in document.paragraphs:
if 'sea' in paragraph.text:
print paragraph.text
paragraph.text = 'new text containing ocean'
To search in Tables as well, you would need to use something like:
for table in document.tables:
for row in table.rows:
for cell in row.cells:
for paragraph in cell.paragraphs:
if 'sea' in paragraph.text:
paragraph.text = paragraph.text.replace("sea", "ocean")
If you pursue this path, you'll probably discover pretty quickly what the complexities are. If you replace the entire text of a paragraph, that will remove any character-level formatting, like a word or phrase in bold or italic.
By the way, the code from @wnnmaw's answer is for the legacy version of python-docx and won't work at all with versions after 0.3.0.
I needed something to replace regular expressions in docx. I took scannys answer. To handle style I've used answer from: Python docx Replace string in paragraph while keeping style added recursive call to handle nested tables. and came up with something like this:
import re
from docx import Document
def docx_replace_regex(doc_obj, regex , replace):
for p in doc_obj.paragraphs:
if regex.search(p.text):
inline = p.runs
# Loop added to work with runs (strings with same style)
for i in range(len(inline)):
if regex.search(inline[i].text):
text = regex.sub(replace, inline[i].text)
inline[i].text = text
for table in doc_obj.tables:
for row in table.rows:
for cell in row.cells:
docx_replace_regex(cell, regex , replace)
regex1 = re.compile(r"your regex")
replace1 = r"your replace string"
filename = "test.docx"
doc = Document(filename)
docx_replace_regex(doc, regex1 , replace1)
doc.save('result1.docx')
To iterate over dictionary:
for word, replacement in dictionary.items():
word_re=re.compile(word)
docx_replace_regex(doc, word_re , replacement)
Note that this solution will replace regex only if whole regex has same style in document.
Also if text is edited after saving same style text might be in separate runs. For example if you open document that has "testabcd" string and you change it to "test1abcd" and save, even dough its the same style there are 3 separate runs "test", "1", and "abcd", in this case replacement of test1 won't work.
This is for tracking changes in the document. To marge it to one run, in Word you need to go to "Options", "Trust Center" and in "Privacy Options" unthick "Store random numbers to improve combine accuracy" and save the document.