2019-06-30 00:56:02

So, suppose you have cythonized your code into a .pyd file. However, is that really secure?
I'm asking because, if I import a cython file, I can figure out what functions and variables it has by using dir() on it. Sure it will take some time, but does that really matter? I can still use somebody else's code and ultimately get my hands on the product.
So my question is this: Can you prevent somebody from just strait up importing your cythonized modules?
For the reference, I'm talking about the file you get by typing cythonize -i -3 script.py

2019-06-30 02:39:34

The goal of Cython isn't to protect people from using your code indirectly (via Cython) but from getting access to the source code. (Its also used for making code faster, but that's neither here nor there.) So, I'd say 'no', unless you change the python bytecode and alter the interpreter, which would (then) make your code useless ton anyone who doesn't have your version of the interpreter.

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2019-06-30 17:15:39 (edited by amerikranian 2019-06-30 17:16:37)

Can you elaborate on that? I thought the bike code was set in stone.  Also, thank you for clearing up the misconception about the purpose of cythonizing

2019-06-30 17:38:20

I thought cython was purely made for speeding up code. Any obscurity you gain is a bonus and not the main goal of the project. Regardless, a version of your compiled interpreter has to be shipped with your code, otherwise it will not run. So people can still extract your modified python from the package and get at it that way. Not easy, but possible. I'd say don't worry about it too much and use some basic cython. If it's an online game, make sure never to trust anything a client sends you. Best scenario: only send the server the keystrokes you pressed, and let the server decide what it needs to do about it.

Roel
golfing in the kitchen

2019-06-30 17:56:29

A question out of curriosity, how much speed you gain by "cythonizing"?

If you want to contact me, do not use the forum PM. I respond once a year or two, when I need to write a PM myself. I apologize for the inconvenience.
Telegram: Nuno69a
E-Mail: nuno69a (at) gmail (dot) com

2019-06-30 19:16:38

nuno69 wrote:

A question out of curriosity, how much speed you gain by "cythonizing"?


According to this source, optimized Cython code can get up to 100 x as fast as pure Python code.

Best Regards.
Hijacker

2019-06-30 20:56:34

Its not set in stone, no. Python is fully open-source, so you are free to modify the bytecodes of the program (they're preprocessor definitions, I believe). Cython is not designed to obscure code or protect it from unauthorized individuals; its designed for making code faster, library embedding, etc.

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2019-06-30 21:27:51

Why do you leav it as a pyv file? wouldn't you compile it in to an exicutable?

2019-06-30 23:50:35

@8, no, not unless your embedding it in a C/C++ application -- and even then you wouldn't compile the cython-generated files as an executable since they contain no entry points.

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2019-07-01 10:50:54

Ethin wrote:

@8, no, not unless your embedding it in a C/C++ application -- and even then you wouldn't compile the cython-generated files as an executable since they contain no entry points.


i don'T know mutch about it bu, you still have chance to build exicutable with such as Pyinstaller or something simillar right?

2019-07-01 12:12:00

Well, the Cython generated code is C/C++ and compiled to machine code, but it still needs an interpreter to run on. Its basically a shared extension for Python. Its harder to read data out of it, since its not byte code but machine code instead, but still not impossible. Its just alot harder, but reverse engineering tools or disassemblers can still work on it.
Best Regards.
Hijacker

2019-07-01 17:58:33 (edited by kianoosh 2019-07-01 17:59:08)

So If I compile all .py files in my application to .pyd with cython, and assuming that my code is optimized, Do I still get the speed increase bennifit without compiling the used modules such as pyglet or pyperclip?

---
Co-founder of Sonorous Arts.
Check out Sonorous Arts on github: https://github.com/sonorous-arts/
my Discord: kianoosh.shakeri2#2988

2019-07-01 19:30:40

When you really want to optimize Python code, start writing Cython code itself, meaning .pyx and .pxd files. Your Python code, even though it can be optimized very well by using Cython bindings python-internally, will never be as fast as pure Cython code.
Cythons magic can also only be applied to the Python/Cython code you yourself write, not the packages you access. Means that all the functions called by pyglet can be optimized, but the pyglet internals won't be optimizable by Cython.
Best Regards.
Hijacker

2019-07-03 07:01:00

Right, just a couple things. Protecting code was a huge reason I avoided python for years, what's the point when you can just quick run uncompyle6 on your code. It fell mostly on me, with some help, to figure out a way to protect NVG. These are my findings. So first, the python bytecode has nothing to do with cython, so you do not need to modify python when working with cython. It will not add any protection, and will make it so other C extensions will likely fail to load. I can't think of much you'd modify, maybe some function names or something like that, but it's pointless and will result in so much complications it's not even worth it, especially because someone could just copy your modded python DLL and import your module anyway. Scrambling opcodes is a way to protect code, but it can be broken if you are smart about it so I wouldn't. So if you want to use the python compiler and use .pyc files like pyinstaller does by default, by all means download python, scramble your opcodes and rebuild it, it'll work and will stop most people who don't know what they are doing from running uncompyle6, but if 1 person with the right knowledge comes around they will destroy your opcode scrambling protection in about 20 minutes. As far as I've learned, the correct thing to do is indeed use cython. If you cythonize code, it turns it into native C, then compiles it to machine instructions. This means that you can run something like ida on it and get assembly instructions, but because a lot of the gruntwork is handled by python, it will be very difficult to understand. Needless to say you won't have to worry about someone getting assembly instructions, modifying it and releasing there own game. I won't say it's impossible because saying that anything is impossible is just having a closed mindset, I'm sure there is someone out there who could somehow break stuff, and there is certainly hex editing that could be done to the already compiled package or other disassembly/modification methods, but cython is the very best protection I think you can possibly get. But I haven't really answered your question yet have I. You want to know if cython is a secure option because you can import modules compiled with it. The answer is that if you use purely cython? No it's not, anyone can import your module and call it's functions, but will not ever be able to attain runnable python source code from just having the pyd file, at least as far as I know. I mean it's turned into C then compiled into assembly instructions, so the original code doesn't exist in the pyd in any form, just the logic, which because of cython is really really hard to determine from assembly instructions. Basically what I'm saying is that to the best of my knowledge, the technology doesn't currently exist at this time to turn pyd files into runnable code, and likely won't for a very long time and if it does it will probably be using technology beyond what I can imagine lol. So anyway, no cython alone will not stop someone from importing your modules. However, cython + a bit of creativity can lead to the best code protection system you could ever hope to have. Remember, all code in your file being imported executes. This means that with a bit of creativity, you can cause the import to fail spectacularly if something doesn't meet your conditions. For example. Imagine you create a variable called token. By default it's set to 0. In all your important functions, you check to see if the token is equal to 927184. If it's not, you cause some sort of fatle crash that causes the app to stop working. There are many ways to do this, you can import wx, create a wx.app object, then destroy it and call one of it's functions, for example, and it'll cause py to die. I'll let you figure out that on your own. Or if you don't want a crash you can show an error message and exit, or even just make the function return. Don't make a function for token verification, recopy the code for each function so it can't be monkeypatched. Now if someone imports your module and tries to misuse it, they can't because the token isn't set to 927184. So unless they know that number to set it to, they can't do anything but dir and get a list of vars or something. In your game, when you import it, just set that token and your module will work fine. Maybe you can create a variable in the sys module before importing your module. At the top of that module, just have the line sys.varname. If it doesn't exist, it'll traceback and the import will fail. I have other ideas but some of them I'm using so I don't want to tell you them, but the idea is just find a way to be creative. For example you are making a game that is going to be compiled into an executable. You know exactly what you need. You think you'll need the dir function? Probably not. So in your protected modules, just monkeypatch the dir function to your own that returns an empty list. Why not, you won't need it. Then if someone imports your module, they can't use dir to see what's in the modules. Just stuff like that. Cython + creativity is, as far as I know, the best protection you can hope to get for your app in python where most things are about open source. Just remember that there is no perfect system and someone will eventually come around and crack it no matter what you do. But the combanation of cython making it so you can't get the original code + creativity can hopefully outsmart most people. Just think of something clever. I mean at the root that's all encryption really is. When making a secret code, the idea is to come up with something more clever than the person your keeping the secret from. It's the same here. Knowing that there is a remarkably high chance that no one will be able to disassemble your pyd files enough to figure out the logic properly, come up with some clever way to make the import fail if it doesn't meet your conditions. Just don't tell anyone what it is.

I am a web designer, and a game developer. If you wish see me at http://www.samtupy.com

2019-07-03 08:54:12

@14, very good reasoning, and extra props for understanding the roots of cryptography. smile

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2019-07-03 11:33:07

@14, You are right.
now, I'm here to give advice for protecting the cythonized code.
1. cythonize your python code
2. after cythonizing, pack it with a packer like nspack, upx (not recommended), aspack etc.
3. use pyInstaller to make an executable (this time don't set in the spec file to use upx otherwise it will fail to load).
4. after your executable is made, pack your executable (with something that doesn't break it). because PyInstaller appends your compiled code at the end of the executable, and some of the packers and cryptors break it.
5. at last but not least, you can use C native extensions with cython.
enjoy

2019-07-03 12:36:10

From what I know, You can't monkey patch the dir function unless the person whose trying to dir your .pyd file, do from dotpydfile import * to get their dir function replaced with the function you specefy in the .pyd file. Or maybe there's another way of monkey patching it. If so please let me know I think I need it soon

---
Co-founder of Sonorous Arts.
Check out Sonorous Arts on github: https://github.com/sonorous-arts/
my Discord: kianoosh.shakeri2#2988

2019-07-03 17:23:21

there is another way
suppose you wrote a module in cython and overrided dir()
then if the person uses del to delete it, then he can use dir() again.

2019-07-03 20:07:16

But the token should work fine. You can have a maybe a 10 digits token so guessing it is nearly impossible too

---
Co-founder of Sonorous Arts.
Check out Sonorous Arts on github: https://github.com/sonorous-arts/
my Discord: kianoosh.shakeri2#2988

2019-07-03 20:55:47 (edited by defender 2019-07-03 20:57:00)

I'm not very knowledgeable on this topic, but along the same lines as what Sam was saying, could you not also find ways to hide little indicators in the contents/metadata of some of the files needed to run the program, or even tiny, seemingly innocuous but unique details in the actual game it's self?
I'm sure others can come up with way better ways of doing this than me, but I figure as long as you keep your watermarks simple and discrete, and not in an obvious place, than they will be allot harder to find.
In this way, you can hopefully confirm weather the code is being reused if you find a suspiciously similar game.
Including some kind of license, no matter how basic, with any open source projects and requiring that those who use it package that file in with their games and also make them open source would potentially help as well, even if only to keep track of who is using what and get rid of any question of legality.

2019-07-04 03:22:57

Overriding any built-in function in Python is trivial:

import builtins
# modify dir...
def dir(object=None):
    return []

builtins.dir = dir

You can do that with any other built-in function. Be careful though, you can break the interpreter if your not careful!

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2019-07-04 03:40:43

the thing is, if they do del builtins.dir all your work will be gon.

2019-07-04 03:47:28

at post 21 that worked thank you.
At defender yeah that's a good idea. And it is very likely to work as people are going to have a hard time converting the .pyd files to pure and readable code, so it's very likely that they won't even find out about the licence or whatever the dev puts in their application. Maybe this way we can get rid of many negative stuff happening in our community...
at post 22 I tried builtins.dir. It still returns an empty list. That did work

---
Co-founder of Sonorous Arts.
Check out Sonorous Arts on github: https://github.com/sonorous-arts/
my Discord: kianoosh.shakeri2#2988

2019-07-04 03:49:47

+1 to Sam who really hit the nail on the head and eloquently said about what I was thinking. I might have a bit more to add at some point, but the one thing here is in reference to monkeypatching and the dir function.

Monkeypatching can be great at times, no doubt about that. But it most certainly comes at a price, with it's due risks and limitations. What I say here can be applied in many different cases, using dir since it's a good example though.

Disassembly. I manage to import your module so yay for that I guess.

>> dir()
[]

Uhh, that's strange. Dir should never be doing such a thing.
Let's just have a quick look see. Want dir back?

>>> dir = lambda : []
>>> dir
[]
#thanks stackoverflow.
>>> dir = [t for t in ().__class__.__base__.__subclasses__() if t.__name__ == 'Sized'][0].__len__.__globals__['__builtins__']["dir"]

Now call dir.

Additionally, another downside lies in the inescapable possibility where dir can most certainly be used in code. Ugly code, but code nevertheless. In many cases, that 364KB module you're including for usability could have a nasty block of code looking something like:

[i for i in dir() if not i.startswith("_") and blablabla]

I've regrettably written such code before, just because it was real easy at the time. Case and point, whenever modifying anything like that, monkeypatching, you've gotta be careful and really think it through. Especially in the standard library!

You could always perform slight modifications to the shipped interpreter. Not major to the point of functionality degradation down the rode, but variables and/or attributes in places they don't need to be. As long as you remain inconspicuous and original, you're giving any potential attacker a serious run for his money.

2019-07-06 21:00:20 (edited by keithwipf1 2019-07-13 22:42:12)

You might want to be careful of tracebacks.
They can give away things you don't want given away, like which attribute your checking for a token so make sure to check it exists first, and crash if it doesn't probably by doing something like
x=lambda x: x(x)
x(x)
I am not sure if this would crash when run as pure Python code, but I think it might if Cython runs that and that'd destroy the interpreter.
These are just ideas, and they might or not be good ones.