My god, it's full of spaces!
PowerShell, 414 409 bytes
function g($a){if($a.length-gt2){g $a[0],(g $a[1..100])}else{if(!$a[1]){$a[0]}else{g $a[1],($a[0]%$a[1])}}}{$a[0]}else{g $a[1],($a[0]%$a[1])}}}
$b={($n|sls '^ +|(?<!^) +' -a).Matches}
$n=$input-split"`n"
$s=g(&$b|%{$_.Index+$_.Length})
($n|%{$n=$_
$w=@(&$b)
$c=($n|sls '(?<!^| ) (?! )'-a).Matches
$w+$c|sort index -d|%{$x=$_.Index
$l=$_.Length
if($s-and!(($x+$l)%$s)){$n=$n-replace"(?<=^.{$x}) {$l}",("`t"*(($l/$s),1-ge1)[0])}}
$n})-join"`n"
I went ahead and used newlines instead of ;
where possible to make display easier. I'm using unix line endings so it shouldn't affect the byte count.
How To Execute
Copy code into SpaceMadness.ps1
file, then pipe the input into the script. I will assume the file that needs converting is called taboo.txt
:
From PowerShell:
cat .\taboo.txt | .\SpaceMadness.ps1
From command prompt:
type .\taboo.txt | powershell.exe -File .\SpaceMadness.txt
I tested it with PowerShell 5, but it should work on 3 or higher.
Testing
Here's a quick PowerShell scrip that's useful for testing the above:
[CmdletBinding()]
param(
[Parameter(
Mandatory=$true,
ValueFromPipeline=$true
)]
[System.IO.FileInfo[]]
$File
)
Begin {
$spaces = Join-Path $PSScriptRoot SpaceMadness.ps1
}
Process {
$File | ForEach-Object {
$ex = Join-Path $PSScriptRoot $_.Name
Write-Host $ex -ForegroundColor Green
Write-Host ('='*40) -ForegroundColor Green
(gc $ex -Raw | & $spaces)-split'\r?\n'|%{[regex]::Escape($_)} | Write-Host -ForegroundColor White -BackgroundColor Black
Write-Host "`n"
}
}
Put this in the same directory as SpaceMadness.ps1
, I call this one tester.ps1
, call it like so:
"C:\Source\SomeFileWithSpaces.cpp" | .\tester.ps1
.\tester.ps1 C:\file1.txt,C:\file2.txt
dir C:\Source\*.rb -Recurse | .\tester.ps1
You get the idea. It spits out the contents of each file after conversion, run through [RegEx]::Escape()
which happens to escape both spaces and tabs so it's really convenient to see what's actually been changed.
The output looks like this (but with colors):
C:\Scripts\Powershell\Golf\ex3.txt
========================================
int
\tmain\(\ \)
\t\{
\t\tputs\("TABS!"\);
\t}
Explanation
The very first line defines a greatest common factor/divisor function g
as succinctly as I could manage, that takes an array (arbitrary number of numbers) and calculates GCD recursively using the Euclidean algorithm.
The purpose of this was to figure out the "longest possible tab width" by taking the index + length of every indentation and tabulation as defined in the question, then feeding it to this function to get the GCD which I think is the best we can do for tab width. A confusion's length will always be 1 so it contributes nothing to this calculation.
$b
defines a scriptblock because annoyingly I need to call that piece of code twice, so I save some bytes that way. This block takes the string (or array of strings) $n
and runs a regex on it (sls
or Select-String
), returning match objects. I'm actually getting both indentations and tabulations in one here, which really saved me extra processing by capturing them separately.
$n
is used for different things inside and outside the main loop (really bad, but necessary here so that I can embed it in $b
's scriptblock and use that both inside and outside the loop without a lengthy param()
declaration and passing arguments.
$s
gets assigned the tab width, by calling the $b
block on the array of lines in the input file, then summing the index and length of each match, returning the array of the sums as an argument into the GCD function. So $s
has the size of our tab stops now.
Then the loop starts. We iterate over each line in the array of input lines $n
. The first thing I do in the loop is assign $n
(local scope) the value of the current line for the above reason.
$w
gets the value of the scriptblock call for the current line only (the indentations and tabulations for the current line).
$c
gets a similar value, but instead we find all the confusions.
I add up $w
and $c
which are arrays, giving me one array with all of the space matches I need, sort
it in descending order by index, and begin iterating over each match for the current line.
The sort is important. Early on I found out the hard way that replacing parts of a string based on index values is a bad idea when the replacement string is smaller and changes the length of the string! The other indexes get invalidated. So by starting with the highest indexes on each line, I make sure I only make the string shorter from the end, and move backwards so the indexes always work.
Into this loop, $x
is in the index of the current match and $l
is the length of the current match. $s
can in fact be 0
and that causes a pesky divide by zero error so I'm checking for its validity then doing the math.
The !(($x+$l)%$s)
bit there is the single point where I check to see if a confusion should be replaced with a tab or not. If the index plus the length divided by the tab width has no remainder, then we're good to go in replacing this match with a tab (that math will always work on the indentations and tabulations, because their size is what determined the tab width to begin with).
For the replace, each iteration of the match loop works on the current line of the input, so it's a cumulative set of replaces. The regex just looks for $l
spaces that are preceded by $x
of any character. We replace it with $l/$s
tab characters (or 1 if that number is below zero).
This part (($l/$s),1-ge1)[0]
is a fancy convoluted way of saying if (($l/$s) -lt 0) { 1 } else { $l/$s }
or alternatively [Math]::Max(1,($l/$s))
. It makes an array of $l/$s
and 1
, then uses -ge 1
to return an array containing only the elements that are greater than or equal to one, then takes the first element. It comes in a few bytes shorter than the [Math]::Max
version.
So once all of the replaces are done, the current line is returned from the ForEach-Object
(%
) iteration, and when all of them are returned (an array of fixed lines), it's -join
ed with newlines (since we split on newlines in the beginning).
I feel like there's room for improvement here that I'm too burnt out to catch right now, but maybe I'll see something later.
Tabs 4 lyfe
Pyth, 102 103 bytes
=T|u?<1hHiGeHGsKmtu++J+hHhGlhtH+tG]+HJ.b,YN-dk<1u+G?H1+1.)Gd]0]0cR\ .zZ8VKVNp?%eNT*hNd*/+tThNTC9p@N1)pb
Try it Online
Interesting idea, but since tabs in the input break the concept, not very usable.
Edit: Fixed bug. many thanks @aditsu
PHP - 278 210 bytes
The function works by testing each tab width, starting with a value of 100, the maximal length of a line and therefore the maximal tab width.
For each tab width, we split each line into "blocks" of that length. For each of this blocks:
- If, by concatenating the last character of the previous block with this block, we find two consecutive spaces before a character, we have an indentation or a tabulation that can't be transformed to space without altering the appearance; we try the next tab width.
- Otherwise, if the last character is a space, we strip spaces at end of the block, add a tabulator and memorise the whole thing.
- Otherwise, we just memorise the block.
Once each blocks of a line have been analysed, we memorise a linefeed. If all the blocks of all the lines were analysed with success, we return the string we've memorised. Otherwise, if each strictly positive tab width have been tried, there was neither tabulation, nor indentation, and we return the original string.
function($s){for($t=101;--$t;){$c='';foreach(split('
',$s)as$l){$e='';foreach(str_split($l,$t)as$b){if(ereg(' [^ ]',$e.$b))continue 3;$c.=($e=substr($b,-1))==' '?rtrim($b).' ':$b;}$c.='
';}return$c;}return$s;}
Here is the ungolfed version:
function convertSpacesToTabs($string)
{
for ($tabWidth = 100; $tabWidth > 0; --$tabWidth)
{
$convertedString = '';
foreach (explode("\n", $string) as $line)
{
$lastCharacter = '';
foreach (str_split($line, $tabWidth) as $block)
{
if (preg_match('# [^ ]#', $lastCharacter.$block))
{
continue 3;
}
$lastCharacter = substr($block, -1);
if ($lastCharacter == ' ')
{
$convertedString .= rtrim($block) ."\t";
}
else
{
$convertedString .= $block;
}
}
$convertedString .= "\n";
}
return $convertedString;
}
return $string;
}
Special thanks to DankMemes for saving 2 bytes.