How to read over 4k input without new lines on a terminal?
If I understand the source correctly, under Linux, the maximum number of characters that can be read in one go on a terminal is determined by N_TTY_BUF_SIZE
in the kernel source. The value is 4096.
This is a limitation of the terminal interface, specifically the canonical (“cooked”) mode which provides an extremely crude line editor (backspace, enter, Ctrl+D at the start of a line for end-of-file). It happens entirely outside the process that's reading.
You can switch the terminal to raw mode, which disables line processing. It also disables Ctrl+D and other niceties, putting an extra burden on your program.
This is an ancient Unix limitation that's never been fixed because there's little motivation. Humans don't enter such long lines. If you were feeding input from a program, you'd redirect your program's input from a file or a pipe.
For example, to use the content of the X clipboard, pipe from xsel
or xclip
. In your case:
xsel -b >file.svg
xclip -selection clipboard >file.svg
Remove -b
or -selection clipboard
to use the X selection (the one that is set by highlighting with the mouse) rather than the clipboard.
On OSX, use pbpaste
to paste the clipboard content (and pbcopy
to set it).
You can access the X clipboard over SSH if you activate X11 forwarding with ssh -X
(which some servers may forbid). If you can only use ssh
without X11 forwarding, you can use scp
, sftp
or sshfs
to copy a file.
If pasting is the only solution because you can't forward the clipboard or you aren't pasting but e.g. faking typing into a virtual machine, an alternative approach is to encode the data into something that has newlines. Base64 is well-suited for this: it transforms arbitrary data into printable characters, and ignores whitespace when decoding. This approach has the additional advantage that it supports arbitrary data in the input, even control characters that the terminal would interpret when pasting. In your case, you can encode the content:
xsel -b | base64 | xsel -b
then decode it:
base64 -d Paste Ctrl+D
The limit you're running into is the maximum size of a line in canonical input mode, MAX_CANON
.
In canonical input mode, the tty driver provides basic line editing services so the userspace program doesn't need to. It doesn't have nearly as many features as readline, but it recognizes a few configurable special characters like erase (usually Backspace or Delete) and kill (usually Ctrl-U).
Most importantly for your question, canonical mode buffers input until the end-of-line character is seen. Because the buffer is in the tty driver, in kernel memory, it's not very large.
You can turn off canonical mode with stty cbreak
or stty -icanon
, and then do your paste. This has the significant disadvantage that you will not be able to send an EOF with Ctrl-D. That's another one of the things that canonical mode is responsible for. You will still be able to terminate the cat
with Ctrl-C because the signal-generating characters are controlled by a separate flag (stty raw
or stty -isig
).
The mystery to me is why, since you've already demonstrated that you know about xclip
, you don't just use xclip -o > file
instead of the cat
If you do:
stty eol =
And then run the demo suggested in your EDIT, you will see foo bar in the printout of test.out. The terminal's line discipline will flush its output to its reader as it reads each special eol char in your input.
A Linux canonical-mode terminal - as can be configured with stty icanon
or probably just stty sane
- handles the following special input characters...
- eof
- default:
^D
- Terminates an input line and flushes output to the reader. Because it is removed from input, if it is input as the only character on a line, it is passed as a null read - or end of file - to the reader.
- default:
- eol
- default: unsassigned
- Also terminates an input line, but is not removed from input.
- kill
- default:
^U
- Erases all buffered input.
- default:
- erase
- default:
^H
(or possibly@
or^?
on some systems) - Erases the last buffered input character.
- default:
When iexten is also set - like stty icanon iexten
or, again, probably just stty sane
, a canonical Linux terminal will also handle...
- eol2
- default: unassigned
- Also also terminates an input line, and is also not removed from input.
- werase
- default:
^W
- Erases the last buffered input word.
- default:
- rprnt
- default:
^R
- Reprints all buffered input.
- default:
- lnext
- default:
^V
- Removes any special significance as far as the line-discipline is concerned for the immediately following input character.
- default:
These characters are handled by removing them from the input stream - excepting eol and eol2, that is - and performing the associated special function before passing the processed stream to the reader - which is usually your shell, but could be whatever the foreground process group is.
Other special input characters which are similarly handled but can be configured independently of any icanon setting include the isig set - set like stty isig
and probably also included in a sane configuration:
- quit
- default:
^\
- Flushes all buffered input (if noflsh is not set) and sends SIGQUIT to the foreground process-group - likely generating a core-dump.
- default:
- susp
- default:
^Z
- Flushes all buffered input (if noflsh is not set) and sends SIGTSTP to the foreground process-group. The suspended process-group can likely be resumed with either of
kill -CONT "$!"
or justfg
in a (set -m
) job-controlled shell.
- default:
- intr
- default:
^C
- Flushes all buffered input (if noflsh is not set) and sends SIGINT to the foreground process-group.
- default:
And the ixon set - configured like stty ixon
and also usually included in a sane config:
- stop
- default:
^S
- Stops all output to the reader until either start is read in input or - when ixany is also set - at least one more character is read.
- default:
- start
- default:
^Q
- Restarts output if it has previously been stopped with stop.
- default:
- Both of stop and start are removed from input when processed, but if output is restarted due to any character in input when ixany is set then that character is not removed.
Special characters handled on other non-Linux systems might include...
- flush
- default:
^O
- Toggles the discarding and flushing of buffered input and is removed from input.
- default:
- dsusp
- default: unassigned
- Flushes all buffered input only when the reader reads the assigned special input character then sends SIGTSTP.
And possibly...
- swtch
- default
^@
(meaning\0
orNUL
) - Switches foreground shell-layers. For use with the
shl
shell-layers application on some systems. - An implementation of
shl
which multiplexes ptys and is therefore compatible with job-control rather than the original implementation's swtch dependent behavior can be freely had in theheirloom-toolchest
tool suite.
- default
For a clearer picture of how and why (and perhaps why not) these input functions are handled consult man 3 termios
.
All of the above functions can be assigned (or reassigned) - when applicable - like stty
function assigned-key
. To disable any single function do stty
function
^-
. Alternatively, as various attempts with assignments for any of the aforementioned line-editing functions with all of GNU, AST, or heirloom's stty
implementations seem to indicate, you can also stty
function
^@
as NUL assignment for any function seems to equate to setting it to unassigned on my linux system.
Probably you do see an echo of these characters when you type them (as can likely be configured w/ [-]ctlecho), but this is only a marker to show you where you did - the program receiving your input has no notion that you typed them (excepting eol[2], that is) and receives only a copy of your input to which the line discipline has applied their effects.
A consequence of the terminal's handling of the various line-editing functions is that it must needs buffer the input to some extent in order to act upon the functions you indicate to it that it should - and so there cannot be a limitless supply of input which you might at any time kill. The line buffer is more precisely the kill buffer.
If you set the eol or eol2 characters to some delimiter which occurs in input - even if neither is a newline or a return character, for example - then you will only be able to kill up to the point that it last occurred and your kill buffer will extend as far as it can until the next of these - or a newline (or return if icrnl is set and igncr is not) - occurs in input.