Notepad delete duplicate lines
I have provided several possible solutions for your consideration. Please forgive me if I go over anything you already know. =)
TL;DR
As of Notepad++ v7.7.1, Notepad++ has a feature called Remove Consecutive Duplicate Lines which does the same thing as the other two solutions given below (i.e. it removes consecutive duplicate lines).
It can be accessed with Edit → Line Operations → Remove Consecutive Duplicate Lines.
Original Answer
Per the comment by @máté-juhász, the accepted answer to this StackOverflow Question will work with your example data.
In essence:
Open Search → Replace... ( Ctrl + H ) in Notepad++.
Under the "Find what:" field, enter the following regular expression:
^(.*?)$\s+?^(?=.*^\1$)
Leave the "Replace with:" field blank and make sure to mark "Regular expression" under the "Search Mode" options.
Once you are ready to remove your lines, click "Replace All".
Note that the original answer seems to indicate that the . matches newline
option should be checked but some people in the comments apparently had better luck leaving it unchecked. For your data, I left it unchecked and it seemed to work well.
ex. Using Regular Expressions
Using uniq
As an alternative, assuming no other option suites your needs, if you have a Windows port of the Unix-based uniq utility, you could possibly integrate this into your workflow with Notepad++.
In short, uniq
performs the same function as the regular expression above but in a potentially more reliable fashion. The downside is that incorporating it with Notepad++ is a bit of a hack. With that in mind, if you wish to give it a go, the basic steps are outlined below.
Getting uniq
To begin, you need a copy of uniq
for Windows. There may be several options available to you but, for simplicity, I might suggest the GnuWin32 CoreUtils package which includes uniq
. You can currently download a lightweight installer if you opt not to download and combine the zipped versions of the CoreUtils package components yourself.
As a tip, for every step in the solution involving uniq
, I would skip using paths with spaces. Unix often treats spaces in directory names differently than Windows, so utilities ported from that environment may have issues with them.
For reference, I am not sure what (if any) file size limits might apply to the GnuWin32 build of uniq
, but I often use it for text files with at least several megabytes of data (often several hundred thousand lines) with ease.
Using uniq With Notepad++
Once uniq
is installed, place something similar to the following lines in a batch file:
C:\path\to\uniq.exe %* > C:\temp\uniq_tmp.txt
notepad++ C:\temp\uniq_tmp.txt
exit()
Save this batch file in a permanent directory you are comfortable with. For reference purposes, I will call this uniq_npp.bat. Note that "temp" can be any folder, but "tmp" and "temp" often exist on Windows already. Likewise, "uniq_tmp.txt" can be any name you wish, so long as it is used consistently.
After saving uniq_npp.bat, we are then ready to integrate its functionality into Notepad++. To do this, open the Notepad++ Run... menu ( F5 ) and enter something similar to the following into the field that appears:
cmd /k C:\path\to\uniq_npp.bat "$(FULL_CURRENT_PATH)"
You can test your Notepad++ command before saving it if you click the left-most "Run" button.
ex. Run... Dialog
Otherwise, click "Save..." and name your command appropriately. You can give it a keyboard shortcut if you care to but it isn't required. Click "OK" to retain your command settings and place it in the Run... dropdown menu for later use.
ex. Run Dropdown Menu
Assuming it interests you, I have a very brief overview of the details of how the uniq
solution works in the "Notes" section at the end of this answer.
Caveats
One important thing to remember about this solutions with uniq
is that it absolutely requires a path to a file saved on disk (the document cannot be open in just Notepad++ alone).
This isn't an issue with an existing file you've opened, but if you create a new file or alter an existing original, you need to Save it first before running your uniq_npp.bat file. Otherwise, the operation will fail and any new data will not be sorted.
As a small advantage, its probably worth mentioning that this save limitation doesn't apply to the regular expression option above.
Notes
Sorting
The solutions offered (that is, the initial regular expression and uniq
) both require duplicate lines to appear directly above one another to be removed e.g.:
duplicate line X
duplicate line X
This means that sorting your data ahead of applying one of these operations is important. I am assuming you are already doing this given your example data, but it's worth mentioning anyway.
Notepad++ Macros
As a small suggestion, since Notepad++ doesn't have any actual keyboard shortcuts for its built-in line sorting operations, you may want to record a macro to help with sorting. Particularly, you can record an Edit → Select All ( Ctrl + A) operation and then choose one of the Edit → Line Operations → Sort Lines Lexicographically options.
For the uniq
solution, it may also be worth considering recording a "Save" operation as the final step to a sorting macro. Also note that the steps for the regular expression option (opening the Replace dialog, entering the regular expression, etc.), can be recorded to a handy macro as well.
How The uniq Solution Works
In brief:
The "Run..." line spawns a command window (
cmd /k
), calls uniq_npp.bat and gives it the path to wherever the current file you have selected is stored.In uniq_npp.bat, this path is captured via the
%*
wildcard passed touniq
. The de-duplicated data fromuniq
is then redirected (>
) to "uniq_tmp.txt".Lastly, the batch file opens this cleaned text in a new Notepad++ tab and the command window is closed via
exit()
.
uniq_npp.bat Improvements (?)
Regarding sorting, another option is to skip using Notepad++ to sort things all together. You potentially lose some flexibility in the process regarding sorting options, but you can just sort items as an extra step in your batch file via the Windows sort command. To add this step, you can modify the first line of uniq_npp.bat as follows:
sort %* | C:\path\to\uniq.exe > C:\temp\uniq_tmp.txt
This simply pipes the sorted data from sort
to uniq
. As you can see, sort
now initially captures the data path, rather than uniq
.
Another thought is to (possibly) use the %*
wildcard as part of a string operation to obtain the original file name and replace e.g. "uniq_tmp.txt" with something like "original-filename_uniq.txt" to make it more... unique.
Potential Pitfalls
By default, Windows
sort
will sort numbers as e.g.1 11 2 21
if they are not preceeded by 0 (e.g.
01, 02, 011, 021
).While the GnuWin32 CoreUtils package does come with a port of the Unix sort utility (which has more robust options than Windows
sort
), this particular implementation (unlike most of the GnuWin32 utilites) strikes me as a bit poor on Windows. However, if you use a different Windows port of the Unix version ofsort
, this issue may not apply and might prove to be a better option overall.