How to find out what files are touched when Get is called?

Here I offer the safe version of Get that can be used successively to collect all the source files and contexts of packages without polluting the memory (too much).

What it does

I have practically reverse-engineered all the necessary functions (Get, Needs, BeginPackage, Begin, EndPackage and End) so that I could inject the monitoring code for introspection. $Context and $ContextPath stacks are correctly set and reset, needs and get behave exactly as their built-in counterparts (at least in my extensive testing). During execution, all contexts created and files encountered (even if not opened by Needs being already in $Package) are collected. At the end, $Context, $ContextPath and $Packages are reset to their original values, and every symbol of every visited context (even private ones) are removed as an attempt to recreate the before-call state of memory as best as possible.

contextJoin[s: {__String}] := StringReplace[StringJoin[#<>"`"& /@s], "`".. -> "`"]
packageButton[file_String] := 
  Button[FileNameTake@file, NotebookOpen@file, Appearance -> "Palette"];

safeGet[pkg_String, arg___, opts : OptionsPattern[]] := Module[{
    bp, ep, begin, end, get, needs, contexts = {}, files = {}, 
    assoc = {}, edges = {}, all, $input, from = "Global`", 
    cStack = {$Context}, cpStack = {$ContextPath}},
   Block[{$Packages = $Packages, $ContextPath = $ContextPath, $Context = $Context},

    Off[General::shdw];
    Options[get] = Options@Get; (* TODO: key for encoding... *)
    get[package_String, key___String, getopts : OptionsPattern[]] := 
     Block[{$Path = $Path, System`Private`$InputFileName},
      If[FreeQ[$Path, #], AppendTo[$Path, #]] & /@ OptionValue@Path;
      System`Private`$InputFileName = FindFile@package;
      AppendTo[edges, from -> System`Private`$InputFileName];
      files = Union[files, {System`Private`$InputFileName}];
      $input = package;
      Block[{from = System`Private`$InputFileName, stream, last, temp},
       stream = OpenRead[System`Private`$InputFileName];
       While[(temp = Read[stream, Hold@Expression]) =!= EndOfFile, 
        last = ReleaseHold@(temp /. {HoldPattern@$Input -> $input})];
       Close@stream;
       last]];
    needs[package_String] := needs[package, FindFile@package];
    needs[package_String, file_String] := Module[{res},
      If[FreeQ[$ContextPath, package], PrependTo[$ContextPath, package]];
      (* Sic! Added regardless whether package was really loaded or not.*)
      If[FreeQ[$Packages, package],
       PrependTo[$Packages, package];
       If[FreeQ[files, file], AppendTo[files, file]];
       res = get@file;
       If[FreeQ[$Packages, package], Message[Needs::nocont, package]];
       (* Sic! Only $Packages are checked but not $ContextPath. *)
       res
       ]];
    bp[ctx_] := bp[ctx, {}];
    bp[ctx_, needed_List] := (
      AppendTo[cStack, $Context];
      AppendTo[cpStack, $ContextPath];
      $ContextPath = DeleteDuplicates@Join[{ctx}, needed, {"System`"}];
      (* DeleteDuplicates keeps order, unlike Union. *)
      $Packages = DeleteDuplicates@Prepend[$Packages, ctx];
      $Context = ctx;
      assoc = Union[assoc, {$Context -> packageButton@System`Private`$InputFileName}];
      contexts = Union[contexts, {$Context}];
      needs /@ needed; (* Public import as it should be for BeginPackage *)
      ctx
      );
    begin[ctx_] := (
      AppendTo[cStack, $Context];
      $Context = contextJoin@{$Context, ctx};
      assoc = Union[assoc, {$Context -> packageButton@System`Private`$InputFileName}];
      contexts = Union[contexts, {$Context}];
      ctx
      );
    ep[] := ({$ContextPath, cpStack} = {Last@cpStack, Most@cpStack};
        {$Context, cStack} = {Last@cStack, Most@cStack};);
    end[] := ({$Context, cStack} = {Last@cStack, Most@cStack};);
    Block[{BeginPackage = bp, EndPackage = ep, Begin = begin, 
      End = end, Needs = needs, Get = get}, Get[pkg, arg]];(* Main call. *)
    all = # <> "*" & /@ contexts;
    Unprotect /@ all;
    Quiet[Remove /@ all];
    On[General::shdw];
    {
      "Package" -> pkg,
      "Contexts" -> contexts,
      "Files" -> (packageButton /@ files),
      "Associations" -> (assoc),
      "Names" -> (Names /@ all),
      "DependencyGraph" -> 
       Graph[edges, EdgeStyle -> Arrowheads@Medium, 
        GraphLayout -> {"LayeredEmbedding", 
          "RootVertex" -> edges[[1, 1]], LayerSizeFunction -> (.4 &), 
          "LeafDistance" -> 1}, 
        VertexShapeFunction -> (Inset[Style[packageButton@#2, Black], #1, #3] &), 
        ImageSize -> 1000]
      }
    ]];

It returns the following things:

  1. The package name
  2. All the contexts (even private ones) created during the call
  3. All the files visited during the call
  4. An association list of contexts -> files, so that later any symbol (i.e. their source file) can be easily tracked.
  5. A dependency graph of the related package files.

Let's try it with a complex package call:

safeGet@"OpenCLLink`"
 {"Package" -> "OpenCLLink`",
  "Contexts" -> {"CCompilerDriver`",  "CCompilerDriver`CCompilerDriverBase`", 
                 ...
                 "LibraryLink`", "LibraryLink`Private`", "OpenCLLink`", 
                 "OpenCLLink`Private`", "SymbolicC`", "SymbolicC`Private`"},
  "Files" -> {"CCompilerDriverBase.m", ..., "WGLPrivate.m"},
  "Names" -> {{}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {},
              {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {},
              {"LibraryLink`$LibraryError"}, {}, {}, {}, {}, {}},
  "DependencyGraph" -> Graph[...]}

The graph is the most informative:

Mathematica graphics

Note that all the contexts touched during the call (stored in contexts) and their symbols are correctly removed, so that at the end, no symbols remain to be listed by Names. The only exception is LibraryLink`$LibraryError: it is probably used by some other function/package already loaded.

Another example:

"DependencyGraph" /. safeGet@"PacletManager`"

Mathematica graphics

Some details of the code

  • Files read directly with OpenRead or Import (i.e. omitting Get or Needs) are not captured!
  • Since $Input has attribute Locked, it cannot be modified locally like $InputFileName, so the Hold/ReleaseHold is required to make the replacement structurally by hand. $Input is used e.g. in the PacletManager`PacletManager.m file; $InputFileName is used extensively in many packages (e.g. CUDALink`).
  • Iterative reading is necessary (ReadList won't work) because any context called with BeginPackage must be created before context-symbols are parsed, otherwise new symbols would be created in the Global` context.
  • A list of non-safe functions can be suppressed to minimize the footprint the package makes by adding them to the replacement list where $Input is replaced inside get (e.g. Print|Save|... -> Hold).
  • Options Method and CharacterEncoding of Get are not implemented, but I did not find any case where they are actually used.
  • No error messages when an invalid context is called (BeginPackage::cxt and Begin::cxt).
  • Symbols created (and protected) by the package directly in the Global` context during the call cannot be recreated again in a second call, so the standar error will be generated Mathematica not being able to overwrite the symbol (Set::write and SetDelayed::write). These symbols could perhaps be captured and suppressed via the $NewSymbol.

This is not a complete answer, but here I will suggest a somewhat different way to extract the dependencies, than in Istvan's answer. I think my method is somewhat more economical. The idea is to load the package of interest in a dynamic environment where $Packages variable is reset to {} initially. This will automatically prompt all packages to be loaded afresh, even those which have been loaded already.

Here is the code for such an environment (formatting done using the code formatting palette):

ClearAll[makeCustomLoadingEnvironment];
makeCustomLoadingEnvironment[argF_]:=
    Module[{seen,depth},
        Function[                
            code,
            Block[{$Packages={},seen,depth=0},                    
                seen[___]:=
                    False;
                Internal`InheritedBlock[                        
                    {Get},                        
                    Unprotect[Get];
                    call:Get[args___]/;!seen[args]:=
                        Module[{},                                
                            seen[args]=True;
                            depth++;
                            argF[{$InputFileName,depth},args];
                            call;
                            depth--;
                        ];
                    Protect[Get];
                    code
                ]
            ],
            HoldAll
        ]
    ];

This constructs the dynamic environment where Get is overloaded, such that an arbitrary function gets applied to its arguments prior to the actual call. Note that since Get is called recursively, the standard Villegas - Gayley technique is not enough, and we actually need a cached of already called argument lists.

Here is a function which actually extracts the dependencies:

ClearAll[getDependencies];
getDependencies[context_String]:=
    Module[{tag,env},            
        env=makeCustomLoadingEnvironment[Sow[{#1,{##2}},tag]&];
        (If[#1==={},{},First[#1]]&)[Reap[env[Get[context]],tag,#2&][[2]]]
    ];

when actually called as, for example,

getDependencies["OpenCLLink`"] // Short[#, 6] &

it returns something like:enter image description here, where each sublist has this structure

{{filename, depth}, {dependencies}}

and each element of dependencies sequence can be either a file name or a context name.

From this result, one can reconstruct the dependency tree / graph. I still have a few glitches in my code which does that, so will post it when I fix those. But my main point here was in any case the method described above, while the dependency graph extraction (or other means of transforming the list of dependencies above to some convenient data structure) is a somewhat separate topic.