2018-11-23 01:13:10

Hi there guys.
I'm glad to say that I've finally released a packfile object in python which can help you by packing game data such as sounds.
For further documentation, please visit the link below
pypackfile
Since my communicating skills and writing skills are fucked up beyond hope, feel free to ask here if you did not understand something in the documentation.
Happy coding, and I hope it can help.

Paul

2018-11-23 03:29:16

Just poking through this, and I've already noticed some huge concerns:
1. Why are you using cryptodome, and why are you directly calling the AES cryptographic functions?
2. Why are you using CFB mode? How are the bits of the cipher encryption determined? Is this AES-128-CFB, AES-192-CFB or AES-256-CFB?
3. Why are you not using Cryptography with AES-256-GCM? Cryptography offers an easy-to-use interface, and you do not need to pass in an IV, only a key. Furthermore, you are requiring users of your code ot pass in an IV, which (about 80-90 percent of the time) is done incorrectly, or a too large or too small Iv is passed in, resulting in dangerously broken (or weak) cryptography, or other vulnerabilities. Do not rely on your users to pass in secure information! Treat everything as dangerous until proven otherwise! As LVH said in his Crypto101 talk, "If your typing the letters AES into your code, you are doing it wrong. If your typing the letters DES into your code, your doing it extra wrong, and if your typing the letters MD5 or SHA into your code, then you might be wrong."
3. Why are you using the zipfile module? Why not something better, like bzip2 or zstandard?
4. In your read_file() method, you explicitly do a check for the bytes '1x01xp'. Why do you do this, and do you understand the consequences of modifying ciphertext or raw data such as sounds and other binary data in this way?
5. Nowhere in your code do I see an attempt to encrypt the zipfile itself. This could open a hole can of wems since anyone with even rudimentary python knowledge could get access to the encrypted data. Granted, it is encrypted, but since you are relying on the user to pass in safe and good keys and acceptable IVs, you are only risking the security of the data by making it possible for uses to pass in short keys.
And now, my recommendations:
1. Switch to using cryptography. Cryptodome does not compile on Python 3.x as last I checked.
2. Rewrite your encryption methods to use Cryptography's Fernet  system. Do not use the low-level Hazmat layer unless you know what you are doing.
3. Put a strict hard limit on how few characters can be passed in. Enforce users to pass in at least a 12-18 character string that will be their key. Anything higher than the low hard limit should be accepted; however, you should probably do a password strength verification to ensure it gets the highest score possible before using it as a key. If it fails the test, reject it.
4. Switch to using Zstandard, Bzip2, Gzip, etc., instead of zipfiles, and tune the algorithm correctly to achieve the smallest size possible.
5. Encrypt the archive. Do not finish until the archive has been encrypted.
6. If you can, attempt to securely wipe the memory that holds the key that the user passes in. Do not keep it around! Check out https://stackoverflow.com/questions/728 … ory-python and https://www.sjoerdlangkemper.nl/2016/06 … -in-python for possibilities.

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2018-11-23 07:53:31 (edited by pauliyobo 2018-11-23 07:54:37)

@2
to point 1 and 2 in the concerns.
the encryption I'm using is just an example, everyone is free to change it if their wish when they use it.

There is not a particular reason for me using zipfile. This is just an old project I had there and I decided to finish it by keeping zipfile.
in the read_file, I check for that piece of information, because if it's found the string will be decoded if it's a byte. I basically needed to recognise a string that was for example a text file. If it was a text file, when I added the file, I would add b'x01xp' so that when it was read, it would be parsed accordingly
True, the zipfile it's self is not encrypted, I'll be working on that.
Cryptodome does compile on 3.x
I like the password verification idea.
Thanks for your suggestions.
P.S. I used cryptodome even for is speed. How fast is the other module you suggested?

Paul

2018-11-23 08:23:00 (edited by Ethin 2018-11-23 08:23:14)

@3, ah. Cryptography is probably better and far more up to date than Crytodome is. Its very fast, if you use it properly, and its very good at what it uses and does.

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2018-11-23 09:25:07

I will give a look to it. Thanks.

Paul

2018-11-23 11:50:18 (edited by cartertemm 2018-11-23 12:08:21)

@2

So first, why not use cryptodome, and why not directly call the AES cryptographic functions?
If this is bad blood between the community and pycrypto then yes I can understand. Pycrypto was ridden with vulnerabilities after energy for the project died and it stopped being maintained. However, Paul is using pycryptodome/pycryptodomex. Last commit to the repository was 2 days ago. One of the major points of cryptodome was to work on python 3, another was, obviously, addressing the security concerns. Can you provide any legit reasons he *shouldn't* be using it other than the fact that you and some other folks seem to like cryptography instead?

Oh yeah, and hold on. Are you sure this guy is a reputable source of information? Because hate to say it, but... I'm typing AES into my code and I'm not doing it wrong. I'm using a hashlib sha function, yes I just typed SHA, and I'm still breathing on. There's a whole other reason md5 isn't recommended, but I hardly think that was the point. I have no idea where this logic comes from and I don't have the time to watch a one hour talk to figure it out.

Agreed in reference to IV's, those should probably be dealt with by the programmer.


I'm laughing right now. Just goes to show the advice given on stackoverflow is really really really not always helpful. I don't know why you'd send a link that basically says what your trying to tell OP he should do isn't possible in his language of choice, but that's neither here nor there. Never, never, never, and may I repeat never rely on del for, securely whiping memory. Like I don't even know why that was suggested. Removing the reference in memory is not removing the content from memory. That advice is about the dumbest thing I've heard this week, and it's american thanksgiving with the family so trust me I've heard enough stupid shit to last quite a while. Second post provides another solution, but it doesn't exactly seem practical. People concerned with securely whiping memory, in python out of all languages, should smell the coffee and focus on the more obvious attack vectors. Let's assume you decide to run pyinstaller or py2exe to bundle that wonderful encryption into a standalone frozen executible for the masses to enjoy. Without protection, I could modify python modules, recompile with compileall and stick into your source tree. Of course the running script wouldn't care, it'd happily spit out the info I'm requesting. But I digress.

2018-11-23 19:31:41

I don't know who gave that talk about typing aes being bad, but it's silly, absurd and makes no sense. Calling a function isn't a problem. There's also nothing wrong as Carter said with the OP's library.
My question is why the fetish for encrypting the entire file. If you encrypt the names and the contents, does it matter? If you encrypt the whole file, you have to read it into memory and decrypt the whole thing, which puts a very hard size limit on your packfiles. Encrypting files and leaving the zip file alone means you can just find the files you need and decrypt them.

2018-11-23 19:36:34

Now I think on it, why does it matter if the OP uses zipfile vs anything else? You would want to compress, then encrypt anyway, meaning that the final archive type should honestly probably just be tar or something similar. sorry for the multiple posts. Ethin did seem to dial back his "Any other comp science person would just laugh at you and ask if you're from mars," now we just need to work on this terrible misinformation issue.

2018-11-23 21:34:14

I suggested using a better compression module primarily to reduce the size of the overall pack file. With Zstandard, for example, you can pass in it about a hundred MB (or, even, just a few hundred KB) of data to "train" it, then you fuck with the compression level (which could be done automatically to find the smallest one by testing the size of the data per each round) so that we aren't distributing (possibly gigabytes) of game data around, and that size mostly consists of the pack file. All of my suggestions were just that, suggestions. No one has to actually use them. As for the AES/SHA thing, tha was a summary of the Crypto101 book, available at https://www.crypto101.io. The reason he stated that statement as one of his opening statements was that some people (especially those who aren't well trained in cryptography) tend to use raw APIs (that is, APIs that are low-level and offer high control over the cryptography process, like Cryptography's Hazmat layer) when the library offers a better (and probably safer) version, or they tend to write their own crypto scheme.
Clearly, pauliyobo was not doing that, but inserting that was more of a cautionary note than anything else. Plus, that brought me nicely into the IV/key issue.
And as for encrypting the entire file, that was, again, a suggestion; you don't need to do it. @8, hate to say it, man, but you certainly have the same attitude of "Any other comp science person would just laugh at you and ask if you're from mars"; and yes, while you do back up your posts with links, and stats, your general attitude has that kind of feel. So, really, its something both of us need to work on, though I don't usually exhibit that attitude off-forum (how odd!). big_smile
Now, to answer the final question, why should you use pyka/cryptography over cryptodome/pycryptodome/pycryptodomex. ? That was partially my personal opinion/bias, though Cryptography does offer its Fernet module, which abstracts away all the IV complexities, encryption scheme and such, so all you have to do is:

from cryptography.fernet import Fernet
# either allow Fernet to create you a key:
key =Fernet.generate_key()
# or pass in your own:
key="'SM{-Jb;5<Fy2C6(cuZ;ep87fjj5^7V&7*nSb<TG2"
# initialize Fernet
f = Fernet(key)
# encrypt data
data = f.encrypt("data".encode())

As for securely wiping memory, I've been doing my own research on that particular problem myself; Python isn't exactly an OK security instrument (of sorts) if you can't do that. So far as I know, I've found This issue on bugs.python.org, this (https://softwareengineering.stackexchan … -in-memory), and, of course, this known security limitations document (https://cryptography.io/en/latest/limitations). That document does state:

Memory wiping is used to protect secret data or key material from attackers with access to uninitialized memory. This can be either because the attacker has some kind of local user access or because of how other software uses uninitialized memory.
Python exposes no API for us to implement this reliably and as such almost all software in Python is potentially vulnerable to this attack. The CERT secure coding guidelines assesses this issue as “Severity: medium, Likelihood: unlikely, Remediation Cost: expensive to repair” and we do not consider this a high risk for most users.

How ever, I did prepend that suggestion with "if you can," so I was really hoping to find one, but my hopes aren't very high. The issue I linked to is probably the only true lifeline for security wiping we have, if they even choose to implement that, though I'd think they might just bring in something like Monocypher and its memory wiping functionality to do that instead.
As for the stack overflow answers, yeah, I've found some pretty dumb ones on there, though about 80-90 percent of the time I've found stack overflow to be very useful.

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2019-05-19 13:21:38

hi there.
The packfile has been stabilized. It now generates a random IV if none is supplied or it completes one if the length is incorrect.
Feel free to look at the documentation in the github repo, since there are a few changes.
If you have any, feel free to write a suggestion.

Paul

2019-05-19 15:38:01 (edited by amerikranian 2019-05-19 15:41:09)

I just tried to compress a 386 mb sound file. The program sat there and then promptly crashed. I passt in the IV and True for both the encrypt and decompress arguments.
To add to that, normal encryption works great.

2019-05-19 16:37:25

have you got any tracebacks? One would be helpful.
With sound file do you mean an actuall sound? Because if yes, that makes me wonder why would you use a 386 mb sound file big_smile

Paul

2019-05-19 18:08:50 (edited by amerikranian 2019-05-19 18:09:12)

I'm so sorry. I will try and remember to provide a traceback in the future with any reports I will give, assuming that I can get one.
Begin traceback:

Traceback (most recent call last):
  File "C:\Users\amerikranian\Downloads\pypackfile\packer.py", line 91, in <module>
    p.create()
  File "C:\Users\amerikranian\Downloads\pypackfile\packer.py", line 24, in create

    if self.compressed: self.compress()
  File "C:\Users\amerikranian\Downloads\pypackfile\packer.py", line 79, in compress
    f.write(zlib.compress(data))
MemoryError

Also, the sound file wasn't 386 mb, the entire folder added up to that number.

2019-05-19 18:12:42

oh, the nice MemoryError, comes back once again. ha

Paul

2019-05-19 18:22:21

Try now. I added a little tweak which should fix your problem, even though you might not be able to use compress() decompress().

Paul

2019-05-19 18:36:13

It works fine now, though if possible I would like to use compress and decompress because that blocked the user from opening up the archive and seeing the names of the files

2019-05-19 18:50:53

Sorry for the double post. After I got a memory error I've been doing some reading and here's what I've found.
The common thing people do to handle large files is use zlib.compressobj(), and then compress a file by loading it bit by bit using the flush() method of the compressing object. When I tried to do the same, however, I got the same memory error. Maybe you'll have better luck at it than I did?

2019-05-20 01:41:34

hi there.
The compress and decompress sem to not do what they should.
The data is compressed and encrypted but 7-zip can still extract the file names.
One thing I can do is ring to encode the names maybe in base64 or obfuscade them in someway.

Paul

2019-06-01 16:13:11

So I'm trying to use this again, and we have a problem:

import packer
packer=packer.packer("sounds","snds.dat",False)

Produces no directory with my sounds. There is no snds.dat created or anything. Have any ideas?
The thing also do not shows tracebacks.

2019-06-02 18:21:33 (edited by pauliyobo 2019-06-02 18:21:54)

hi there. There was an error in the indentation, the packer should now work properly.
Also, remember that the first argument to supply is the pack name not the directory. The directory is the second argument.

Paul

2019-06-08 17:16:26 (edited by amerikranian 2019-06-08 17:17:36)

So, we have another issue

import packer,sound
pack=packer.packer("sounds.dat","sounds",True,"1111111111111111","1111111111111111")
snd=sound.sound()
snd.load(pack.read_file("amb.ogg"),True)
snd.play()
while snd.handle.is_playing: pass

The code works fine, creating sounds.dat, here's where it crashes:

Traceback (most recent call last):
  File "C:\Users\amerikranian\Desktop\simon\dscript.py", line 4, in <module>
    snd.load(pack.read_file("amb.ogg"),True)
  File "C:\Users\amerikranian\Desktop\simon\packer.py", line 86, in read_file
    if self.encrypted: string = zlib.decompress(decrypt(zip.open(file).read(), self.key, self.IV))
zlib.error: Error -3 while decompressing data: incorrect header check

2019-06-09 20:38:58

should be fixed. Also thank you for the reporting you've been doing. I really appreciate.

Paul

2019-06-11 00:54:32

No problem. I will not get to test it until Saturday, but I will let you know the result then.