How to hack GHCi (or Hugs) so that it prints Unicode chars unescaped?
Things will change on the next version 7.6.1 of Ghci as it supplies a new Ghci option called: -interactive-print. Here is copied from ghc-manual: (And I writed myShow and myPrint as follows)
2.4.8. Using a custom interactive printing function
[New in version 7.6.1] By default, GHCi prints the result of expressions typed at the prompt using the function System.IO.print. Its type signature is Show a => a -> IO (), and it works by converting the value to String using show.
This is not ideal in certain cases, like when the output is long, or contains strings with non-ascii characters.
The -interactive-print flag allows to specify any function of type C a => a -> IO (), for some constraint C, as the function for printing evaluated expressions. The function can reside in any loaded module or any registered package.
As an example, suppose we have following special printing module:
module SpecPrinter where
import System.IO
sprint a = putStrLn $ show a ++ "!"
The sprint function adds an exclamation mark at the end of any printed value. Running GHCi with the command:
ghci -interactive-print=SpecPrinter.sprinter SpecPrinter
will start an interactive session where values with be printed using sprint:
*SpecPrinter> [1,2,3]
[1,2,3]!
*SpecPrinter> 42
42!
A custom pretty printing function can be used, for example, to format tree-like and nested structures in a more readable way.
The -interactive-print flag can also be used when running GHC in -e mode:
% ghc -e "[1,2,3]" -interactive-print=SpecPrinter.sprint SpecPrinter
[1,2,3]!
module MyPrint (myPrint, myShow) where
-- preparing for the 7.6.1
myPrint :: Show a => a -> IO ()
myPrint = putStrLn . myShow
myShow :: Show a => a -> String
myShow x = con (show x) where
con :: String -> String
con [] = []
con li@(x:xs) | x == '\"' = '\"':str++"\""++(con rest)
| x == '\'' = '\'':char:'\'':(con rest')
| otherwise = x:con xs where
(str,rest):_ = reads li
(char,rest'):_ = reads li
And they work well:
*MyPrint> myPrint "asf萨芬速读法"
"asf萨芬速读法"
*MyPrint> myPrint "asdffasdfd"
"asdffasdfd"
*MyPrint> myPrint "asdffa撒旦发"
"asdffa撒旦发"
*MyPrint> myPrint '此'
'此'
*MyPrint> myShow '此'
"'\27492'"
*MyPrint> myPrint '此'
'此'
One way to hack this is to wrap GHCi into a shell wrapper that reads its stdout and unescapes Unicode characters. This is not the Haskell way of course, but it does the job :)
For example, this is a wrapper ghci-esc
that uses sh
and python3
(3 is important here):
#!/bin/sh
ghci "$@" | python3 -c '
import sys
import re
def tr(match):
s = match.group(1)
try:
return chr(int(s))
except ValueError:
return s
for line in sys.stdin:
sys.stdout.write(re.sub(r"\\([0-9]{4})", tr, line))
'
Usage of ghci-esc
:
$ ./ghci-esc
GHCi, version 7.0.2: http://www.haskell.org/ghc/ :? for help
> "hello"
"hello"
> "привет"
"привет"
> 'Я'
'Я'
> show 'Я'
"'\Я'"
> :q
Leaving GHCi.
Note that not all unescaping above is done correctly, but this is a fast way to show Unicode output to your audience.
Option 1 (bad):
Modify this line of code:
https://github.com/ghc/packages-base/blob/ba98712/GHC/Show.lhs#L356
showLitChar c s | c > '\DEL' = showChar '\\' (protectEsc isDec (shows (ord c)) s)
And recompile ghc.
Option 2 (lots of work):
When GHCi type checks a parsed statement it ends up in tcRnStmt
which relies on mkPlan
(both in https://github.com/ghc/ghc/blob/master/compiler/typecheck/TcRnDriver.lhs). This attempts to type check several variants of the statement that was typed in including:
let it = expr in print it >> return [coerce HVal it]
Specifically:
print_it = L loc $ ExprStmt (nlHsApp (nlHsVar printName) (nlHsVar fresh_it))
(HsVar thenIOName) placeHolderType
All that might need to change here is printName
(which binds to System.IO.print
). If it instead bound to something like printGhci
which was implemented like:
class ShowGhci a where
showGhci :: a -> String
...
-- Bunch of instances?
instance ShowGhci Char where
... -- The instance we want to be different.
printGhci :: ShowGhci a => a -> IO ()
printGhci = putStrLn . showGhci
Ghci could then change what is printed by bringing different instances into context.
There has been some progress with this issue; thanks to bravit (Vitaly Bragilevsky)!:
- work in progress: Даёшь кириллицу в GHCi! — 2 -- around the related ticket;
- the result of the work: Даёшь кириллицу в GHCi! — 3 -- with the patch and another one for the docs by bravit (Vitaly Bragilevsky). These enhancements have been committed: 1 and 2.
Probably incorporated into GHC 7.6.1. (Is it?..)
How to make it print Cyrillic now:
The parameter passed to GHCi should be a function which can print Cyrillic. No such function has been found on Hackage. So, we have to create a simple wrapper, as for now:
module UPPrinter where import System.IO import Text.PrettyPrint.Leijen upprint a = (hPutDoc stdout . pretty) a >> putStrLn ""
And run
ghci
this way:ghci -interactive-print=UPPrinter.upprint UPPrinter
Of course, this can be written down once and for all into
.ghci
.
Practical problem: coming up with an alternative nice Show
So, now there is a practical problem: what to use as a substitute of the standard Show
(which--the standard Show
--escapes the wanted symbols against our wish)?
Using others' work: other pretty-printers
Above, Text.PrettyPrint.Leijen
is suggested, probably because it is known not escape such symbols in strings.
Our own Show based on Show -- attractive, but not practical
What about writing our own Show
, say, ShowGhci
as was suggested in an answer here. Is it practical?..
To save work defining the instances for an alternative Show
class (like ShowGhci
), one might be tempted to use the existing instances of Show
by default, only re-define the instance for String
and Char
. But that won't work, because if you use showGhci = show
, then for any complex data containing strings show
is "hard-compiled" to call old show
to show the string. This situation asks for the ability to pass different dictionaries implementing the same class interface to functions which use this interface (show
would pass it down to subshow
s). Any GHC extensions for this?
Basing on Show
and wanting to redefine only the instances for Char
and String
is not very practical, if you want it to be as "universal" (widely applicable) as Show
.
Re-parsing show
A more practical (and short) solution is in another answer here: parse the output from show
to detect chars and strings, and re-format them. (Although seems a bit ugly semantically, the solution is short and safe in most cases (if there are no quotes used for other purposes in show
; must not be the case for standard stuff, because the idea of show
is to be more-or-less correct parsable Haskell.)
Semantic types in your programs
And one more remark.
Actually, if we care about debugging in GHCi (and not simply demonstrating Haskell and wanting to have a pretty output), the need for showing non-ASCII letters must come from some inherent presence of these characters in your program (otherwise, for debugging, you could substitute them with Latin characters or not care much about being shown the codes). In other words, there is some MEANING in these characters or strings from the point of view of the problem domain. (For example, I've been recently engaged with grammatical analysis of Russian, and the Russian words as part of an example dictionary were "inherently" present in my program. Its work would make sense only with these specific words. So I needed to read them when debugging.)
But look, if the strings have some MEANING, then they are not plain strings any more; it's data of a meaningful type. Probably, the program would become even better and safer, if you would declare a special type for this kind of meanings.
And then, hooray!, you simply define your instance of Show
for this type. And you are OK with debugging your program in GHCi.
As an example, in my program for grammatical analysis, I have done:
newtype Vocable = Vocable2 { ortho :: String } deriving (Eq,Ord)
instance IsString Vocable -- to simplify typing the values (with OverloadedStrings)
where fromString = Vocable2 . fromString
and
newtype Lexeme = Lexeme2 { lemma :: String } deriving (Eq,Ord)
instance IsString Lexeme -- to simplify typing the values (with OverloadedStrings)
where fromString = Lexeme2 . fromString
(the extra fromString
here is because I might switch the internal representation from String
to ByteString
or whatever)
Apart from being able to show
them nicely, I got safer because I wouldn't be able to mix different types of words when composing my code.