Did you ever saw a char and thought: "Damn, 1 byte for a single char is pretty darn inefficient"? No? Well I did. So what I decided to do instead is to pack 5 chars, convert each char to a 2 digit integer and then concat those 5 2 digit ints together into one big unsigned int and boom, I saved 5 chars using only 4 instead of 5 bytes. The reason this works is, because one unsigned int is a ten digit long number and so I can save one char using 2 digits. In theory you could save 32 different chars using this technique (the first two digits of an unsigned int are 42 and if you dont want to account for a possible 0 in the beginning you end up with 32 chars). If you would decide to use all 10 digits you could save exactly 3 chars. Why should anyone do that? Idk. Is it way to much work to be useful? Yes. Was it funny? Yes.

Anyone whos interested in the code: Heres how I did it in C: https://pastebin.com/hDeHijX6

Yes I know, the code is probably bad, but I do not care. It was just a funny useless idea I had.

top 50 comments

sorted by: hot top controversial new old

[–] bandwidthcrisis@lemmy.world 56 points 1 week ago

You would have done well with this kind of thinking in the mid-80s when you needed to fit code and data into maybe 16k!

As long as you were happy to rewrite it in Z80 or 6502.

Another alternative is arithmetic encoding. For instance, if you only needed to store A-Z and space, you code those as 0-26, then multiply each char by 1, 27, 27^2, 26^3 etc, the add them.

To unpack them, divide by 27 repeatedly, the remainder each time is each character. It's simply covering numbers to base-27.

It wouldn't make much difference from using 5 bits per char for a short run, though, but could be efficient for longer strings, or if encoding a smaller set of characters.

[–] BartyDeCanter@lemmy.sdf.org 30 points 1 week ago (7 children)

Oh god that switch statement. Here, let me come up with something better:

if (pChar >= 'a' && pChar <= 'z') {
  return pChar - 'a' + 10;
} else if (pChar == ' ') {
  return 36;
} else if (pChar == '.'){
  return 37;
}
return 0;

[–] cows_are_underrated@feddit.org 9 points 1 week ago

Ah, thats cool. Did not knew you could that.. Thanks.

[–] misteloct@lemmy.dbzer0.com 6 points 1 week ago

First rule of code review, do not sound judgemental.

load more comments (5 replies)

[–] Valmond@lemmy.world 28 points 1 week ago (6 children)

CPU still pulls a 32kb block from RAM...

[–] enumerator4829@sh.itjust.works 16 points 1 week ago (1 children)

Lol, using RAM like last century. We have enough L3 cache for a full linux desktop in cache. Git gud and don’t miss it (/s).

(As an aside, now I want to see a version of puppylinux running entirely in L3 cache)

load more comments (1 replies)

[–] BartyDeCanter@lemmy.sdf.org 8 points 1 week ago

Look at this guy with their fancy RAM caches.

load more comments (4 replies)

[–] RiQuY@lemmy.zip 27 points 1 week ago (3 children)

Interesting idea but type conversion and parsing is much more slower than wasting 1 byte. Nowadays memory is "free" and the main issue is the execution speed.

[–] rtxn@lemmy.world 10 points 1 week ago (1 children)

Fuck it. *uses ulong to store a boolean*

load more comments (1 replies)

[–] cows_are_underrated@feddit.org 8 points 1 week ago

I know. This whole thing was never meant to be very useful, and more like a proof of concept

[–] jenesaisquoi@feddit.org 6 points 1 week ago

Alignment wastes much more anyways

[–] traceur301@lemmy.blahaj.zone 26 points 1 week ago (3 children)

I'm not sure if this is the right setting for technical discussion, but as a relative elder of computing I'd like to answer the question in the image earnestly. There's a few factors squeezing the practicality out of this for almost all applications: processor architectures (like all of them these days) make operating on packed characters take more operations than 8 bit characters so there's a speed tradeoff (especially considering cache and pipelining). Computers these days are built to handle extremely memory demanding video and 3d workloads and memory usage of text data is basically a blip in comparison. When it comes to actual storage and not in-memory representation, compression algorithms typically perform better than just packing each character into fewer bits. You'd need to be in a pretty specific niche for this technique to come in handy again, for better or for worse

[–] cows_are_underrated@feddit.org 6 points 1 week ago (1 children)

This is 100% true. I never plan on actually using this. It might be useful for working on microcontrollers like an ESP32, but apart from that the trade of for more computational power is not worth the memory savings.

load more comments (1 replies)

[–] gusgalarnyk@lemmy.world 6 points 1 week ago

I liked the technical discussion so thank you. Keep it up, I got into this career because there was always so much to learn.

load more comments (1 replies)

[–] drath@lemmy.world 22 points 1 week ago* (last edited 1 week ago)

Oh god, please don't. Just use utf8mb4 like a normal human being, and let the encoding issues finally die out (when microsoft kills code pages). If space is of consideration, just use compression, like gz or something.

[–] anton@lemmy.blahaj.zone 21 points 1 week ago (1 children)

You could save 0.64 bit per char more if you actually treated you output as a binary number (using 6 bits per char) and didn't go through the intermediary string (implicitly using base 100 at 6.64 bits per char).
This would also make your life easier by allowing bit manipulation to slice/move parts and reducing work for the processor because base 100 means integer divisions, and base 64 means bit shifts. If you want to go down the road of a "complicated" base use base 38 and get similar drawbacks as now, except only 5.25 bits per char.

[–] Redkey@programming.dev 7 points 1 week ago

I was so triggered by the conversion from char-to-int-to-string-to-packedint that I had to write a bitwise version that just does char-to-packedint (and back again), with bitwise operators.

https://pastebin.com/V2An9Xva

As others have pointed out, there are probably better options for doing this today in most real-life situations, but it might make sense on old low-spec systems if not for all the intermediate conversion steps, which is why I wrote this.

[–] null@lemmy.nullspace.lol 20 points 1 week ago

Not useless -- you have a future in tiny, embedded systems.

[–] joseandres42@lemmy.world 18 points 1 week ago

I do this kind of thing everyday as a firmware engineer :)

[–] firelizzard@programming.dev 17 points 1 week ago (3 children)

Does the efficiency of storage actually matter? Are you working on a constrained system like a microcontroller? Because if you’re working on regular software, supporting Unicode is waaaaaaaaaaay more valuable than 20% smaller text storage.

[–] ryannathans@aussie.zone 20 points 1 week ago

Unicode? Sir this is C, if the character doesn't fit into a uint8 it's scope creep and too hard

load more comments (2 replies)

[–] magic_lobster_party@fedia.io 17 points 1 week ago (4 children)

It’s all fun and games until the requirement changes and you need to support uppercase letters and digits as well.

load more comments (4 replies)

[–] ChaoticNeutralCzech@feddit.org 10 points 1 week ago (1 children)

unsigned int turn_char_to_int(char pChar)
{
    switch(pChar)
    {
    case 'a':
        return 10;
    case 'b':
        return 11;
    case 'c':
        return 12;
    case 'd':
        return 13;
    case 'e':
        return 14;
    case 'f':
        return 15;
    case 'g':
        return 16;
    case 'h':
        return 17;
    case 'i':
        return 18;
    case 'j':
        return 19;
    case 'k':
        return 20;
    case 'l':
        return 21;
    case 'm':
        return 22;
    case 'n':
        return 23;
    case 'o':
        return 24;
    case 'p':
        return 25;
    case 'q':
        return 26;
    case 'r':
        return 27;
    case 's':
        return 28;
    case 't':
        return 29;
    case 'u':
        return 30;
    case 'v':
        return 31;
    case 'w':
        return 32;
    case 'x':
        return 33;
    case 'y':
        return 34;
    case 'z':
        return 35;
    case ' ':
        return 36;
    case '.':
        return 37;

    }
}

Are you a monster or just stupid?

[–] cows_are_underrated@feddit.org 5 points 1 week ago (4 children)

Just stupid

load more comments (4 replies)

[–] carrylex@lemmy.world 10 points 1 week ago

collapsed inline media

[–] daniskarma@lemmy.dbzer0.com 9 points 1 week ago (1 children)

I was hopping it was somehow badly implemented in python and each char ended up occupying 2Gb

[–] cows_are_underrated@feddit.org 6 points 1 week ago (3 children)

Hmmmmmmm, that sounds like another fun idea. Trying to make storing a single char as inefficient as possible.

load more comments (3 replies)

[–] HeyThisIsntTheYMCA@lemmy.world 9 points 1 week ago (2 children)

dammit yesterday was too long i thought this was a dnd joke at first

load more comments (2 replies)

[–] Dumhuvud@programming.dev 8 points 1 week ago (1 children)

In typical C fashion, there's undefined behavior in turn_char_to_int. xD

load more comments (1 replies)

[–] InternetCitizen2@lemmy.world 8 points 1 week ago

Yes I know, the code is probably bad, but I do not care

That's why we love it.

[–] hdsrob@lemmy.world 8 points 1 week ago (1 children)

We have a binary file that has to maintain compatibility with a 16 bit Power Basic app that hasn't been recompiled since '99 or '00. We have storage for 8 character strings in two ints , and 12 character string in two ints and two shorts.

[–] cows_are_underrated@feddit.org 7 points 1 week ago

Damn, that are setups where you have to get creative.

[–] CookieOfFortune@lemmy.world 5 points 1 week ago* (last edited 1 week ago)

I mean… you’d get better results for large data sets by just using a known compression algorithm. This is only viable for situations where you only have a small amount of data, enough computation to run this conditional, but not enough computation to run compression/decompression.

load more comments