Programmatically convert notebook input cells to text file
As Kuba say in the comments, NotebookImport
does what you need. Specifically, NotebookImport[nb, "Input" -> "Text"]
preserves the original formatting and syntax of input cells what should fit your purposes exactly. You can Riffle
the obtained list of strings with some delimiter in order to distinguish different cells in the exported file:
StringJoin[Riffle[NotebookImport[nb, _ -> "Text"], "\n=================\n"]]
It is easy to automatize the conversion:
nbFileNames = FileNames["*.nb"];
Do[
Export[name <> ".txt",
StringJoin[Riffle[NotebookImport[name, _ -> "Text"], "\n=================\n"]]
, "Text"],
{name, nbFileNames}]
UPDATE
Here is an improved solution which correctly handles special and Unicode characters as well as 2D typesetting, and encodes everything using the linear syntax (tutorial/StringRepresentationOfBoxes):
nbFileNames = FileNames["*.nb"];
Do[Export[name <> ".txt",
StringJoin[Riffle[
ToString[#, InputForm, CharacterEncoding -> "PrintableASCII"] & /@
NotebookImport[name, "Input" -> "Text"],
"\n=================\n"]],
"Text"],
{name, nbFileNames}]
The linear syntax has an advantage of being universal, exact and platform-independent representation of a string (which is allowed to contain rich formatting and 2D typesetting) at the expense of substantially lesser readability.
Updated to support multiline input cells and syntax errors
I would avoid linear syntax. Why not use something like the following:
inputCellsToText[nb_, textFile_] := Internal`InheritedBlock[
{SequenceForm},
SetAttributes[SequenceForm, HoldAll];
Export[
textFile,
Flatten[extractInput /@ NotebookImport[nb, "Input"->"HeldInterpretedCell"]],
"Text"
]
]
extractInput[HoldComplete[ExpressionCell[a_List, __]]] := DeleteCases[
SequenceForm /@ Unevaluated[a],
SequenceForm[Null]
]
extractInput[HoldComplete[ExpressionCell[a_, __]]] := SequenceForm[a]
For example, the following notebook:
nb = Uncompress @"1:eJzVVc1rE0EUj7WxftBaD4onWQ9KQmrNbkIa24gkaSJVYyRbVMTLJp1tR9OZMDtrGw/iWfQqFRT8I6SXinryIChexVJQ60F6Kd48CM6bbD42TfMh9eDlzb437/P33rw9nqc50+vxeKy9glymHOUpvW3uBkm/IJewxc1dVS6JikWzD7hBhzvPqF2aNLjRZNLvMpEOBgRJ0EWp7N3qf48gObogNCoRarfS8SFBdDtv2SXErALDJS4UdbC5Oa5pWkIHH8GaQE11cCq5A4KkmVHgmBJwBybqFkNXYVLH14WOKmmgEmhIZu9KHW4rVJPU3z5hp7L0WDjpmDqCUDzURTaVGAHnu42pG6XOjsOSKg3fAR2aO0VKNq97y9lFZB10xiE5Z5BZNI3nkVWfE/CIwB5s7iRuodWljfijh5/n4KxdrCwPvVoXgsfvIq/hZM+9A2/XGzU+Dg++B8Fmjn6AsymHw+IjNV+aMyx8F+llwo3FFGOUWVJlmtmoXmTr0W05ORVEg6FksAHirM27AkHmDgrjR3/Lct/E78vqTi+dkLVMfJk4+x1qaZ+a04QdDR0bj8nQ0ku2hEh3y6DFC94e0TYPvPmVhHoZLgQa1bn4ufltBQp80Bd+udoZy96S6gXyOoCADVuO3ZD5/EuA2z7jbTdUdaYTauMK2aH1mK+t6mBUCgqVLfjX/c18+vFkTeC4ce/rs7Uu+vs/YdLTi24E5djJF08BjCNXfklQ6kPm9rNffFzDZIYu6GItuodrChKNBqPyjETPNJkO1kwzBpvFxGr1o61yOCYys/YJLm5zOm9wXHAruO+wJrgWJacZJTxFZq6K1ogFrJ8TMlUdVRWTMiVjFJSsrlxXFqMRxRfSTuUxH1EiYTiVi4gRVPQrvgsGsQ1WVsIjihZUx/xNUYah9bxcRJPIxATDmrd0QEnwhl3koyT/ByPIIdA=";
produces the following text file:
Integrate[1/(1 + x^2), {x, 0, Infinity}]
2 + 2
ErrorBox[ErrorBox[RowBox[{RowBox[{"4", " ", "4"}], "+"}]]]
Subscript[x, 2]^3
α^2 + (Element[b, c])
Here is another approach which offers better readability while maintaining robustness of the method shown in the "UPDATE" section of previous answer (with only exception to compatibility between different platforms: carriage returns aren't escaped here).
In order to avoid non-ASCII characters in the output file I use here improved fromCode
function from this answer (which converts character codes into Mathematica's ASCII representation of the corresponding characters):
fromCode[c_Integer] /; c <= 127 := FromCharacterCode[c];
fromCode[c_Integer] :=
StringTake[ToString[FromCharacterCode[c], InputForm,
CharacterEncoding -> "PrintableASCII"], {2, -2}];
The main function:
convertNBtoTXT[fileName_String] :=
Module[{stream, lines, delim = "\n===============\n"},
stream = OpenWrite[fileName <> ".txt", BinaryFormat -> True];
lines = NotebookImport[fileName, "Input" -> "Text"];
WriteString[stream, Sequence @@ fromCode /@ ToCharacterCode[#], delim] & /@ lines;
Close[stream];
];
Note the Sequence @@
trick: writing character-by-character allows to avoid conversion of typesetting into OutputForm
, and as the result the linear syntax is exported exactly as it is present in the string.
Usage:
nbFileNames = FileNames["*.nb"];
convertNBtoTXT /@ nbFileNames;
In my tests it correctly handles Unicode and special characters as well as 2D typesetting.
The original input cells can be recovered from exported ASCII file using FrontEnd`UndocumentedTestFEParserPacket
as follows:
fileName = "document.nb.txt";
delim = "\n===============\n";
NotebookPut[
Notebook[
Cell[First@
MathLink`CallFrontEnd[FrontEnd`UndocumentedTestFEParserPacket[#, True]],
"Input"] & /@
ReadList[fileName, Record, RecordSeparators -> delim]]]
UPDATE
Using the ExportAsASCII
function from this answer we can achieve the same in a more efficient way:
nbFileNames = FileNames["*.nb"];
Do[
ExportAsASCII[name <> ".txt",
StringJoin[Riffle[NotebookImport[name, "Input" -> "Text"], "\n=================\n"]]],
{name, nbFileNames}]