Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length

 
Advanced search

1345233 Posts in 61706 Topics- by 53286 Members - Latest Member: AngusTepper

August 19, 2018, 10:57:10 PM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsDeveloperTechnical (Moderator: ThemsAllTook)I want to create an assembly compiler
Pages: [1] 2
Print
Author Topic: I want to create an assembly compiler  (Read 1120 times)
Daywalker
Level 0
***


Honour and Glory


View Profile WWW
« on: June 15, 2018, 07:41:31 AM »

Hi guys, I will be making a IA-32 assembly compiler soon, this is a topic I open now to let you all know, and let anyone who wants to participate join in .

If you have knowledge in assembly or machine code instructions, then you may be useful for this project, as I have never before done something on this level .

Open discussion, let me know your thoughts. So far I've learned assembly, but have yet to understand the binary composition thereof to the lowest level (I know how to compile a jmp $ instruction.. but that's about it lol) .

I will be posting further updates here, and hope to make the compiler available and open source to the general public . (this is a side project to my current dev log)
Logged

Daid
Level 1
*



View Profile
« Reply #1 on: June 16, 2018, 02:26:21 AM »

First question that comes to mind is, why? For using it, it's not like there are no valid alternatives: https://en.wikipedia.org/wiki/Comparison_of_assemblers#x86_assemblers

For using it, everything is moving to 64bit these days. So x86-64 makes a lot more sense from that perspective if you want to squeeze performance.

As a practice project, IA-32 is only of the most complexly large instruction sets available, making your life complex. You would be better off with something like Z80 (used in the gameboy, and thus easy to test in an emulator)


(Getting down to the bare metal of a machine is my area of expertise, so feel free to shoot any questions here. But I won't directly help with your project, due to me spending my main time somewhere else)
Logged

Daywalker
Level 0
***


Honour and Glory


View Profile WWW
« Reply #2 on: June 16, 2018, 03:34:56 AM »

First question that comes to mind is, why? For using it, it's not like there are no valid alternatives: https://en.wikipedia.org/wiki/Comparison_of_assemblers#x86_assemblers

Many reasons (which I could explain each one by one, but it'll be a long.. technical post). But mostly because I wish to understand how a machine works to the lowest level, and things compilers do I wish to understand also.

One thing I have noticed is that it seems that most compilers don't seem to be able to process machine-code structures.. only machine-code ascii files (a.k.a. text files)

This is a problem for me also


For using it, everything is moving to 64bit these days. So x86-64 makes a lot more sense from that perspective if you want to squeeze performance.

Performance wise I'm not currently too worried in differences between 32-bit and 64-bit, though I'm sure I might consider it in the future


As a practice project, IA-32 is only of the most complexly large instruction sets available, making your life complex. You would be better off with something like Z80 (used in the gameboy, and thus easy to test in an emulator)

I already know the instruction set, yet this project will only take a small subset of the instruction set, which I shall decree to be useful. This is not a full, general purpose compiler I guess, I will be cutting off a lot of system/cpu-specific functionality, or any instructions I don't like, though again, I can decide to add them in the future should they become necessary/useful.


(Getting down to the bare metal of a machine is my area of expertise, so feel free to shoot any questions here. But I won't directly help with your project, due to me spending my main time somewhere else)

Thanks, appreciate it, when I start to tackle this I'm sure I'll have a few Smiley


[EDIT 1]

I suppose I should give you all some context, in order for you to better understand my reasoning to undertake this project at this specific point in time.

See, I don't like C++..
I don't like C..
I don't like any of the currently existing languages..
In fact.. I don't like Windows, and I don't like any operating systems that currently exist.. (and I am a computer-programming obsessed freak.. well, maybe you will find a few "n++" statements flowing down my bloodstream lol)

Right now I have a language called the "Shadow" language, which currently compiles to a high-level language called "GML" (Game Maker Language). Basically, I want to make my language much more powerful, and less system/application specific.

I will root my language in the assembly language, yet for this I don't need to make my own compiler - but whatever compiler I am using.. I don't want it to be slow or do unnecessary calculations (I'm a perfectionist.. yes.. it bothers me..)

Considering I will be cutting off a lot of functionality, I don't want to use a compiler, which for example checks for those instructions which I will never use. First, I have to know the workings behind the compiler (there's only currently one ASM compiler which I am interested in, which is FASM)

Also, I have seen some of the documentation stating that FASM does "multiple passes" and tries to "predict" stuff, which I didn't understand how on earth that makes sense, so to figure it out, I want to create my own compiler, and see if those "passes" are actually necessary.
« Last Edit: June 16, 2018, 04:12:13 AM by Daywalker » Logged

Polly
Level 6
*



View Profile
« Reply #3 on: June 16, 2018, 09:10:41 AM »

If you've never written a assembler* before, i'd recommend taking a look at a open-source assembler for a simple CPU first ( for example this DCPU-16 assembler ) It should give you a general idea on how to write your own assembler.

*A "assembly compiler" is generally referred to as a "assembler" Wink
Logged
Daywalker
Level 0
***


Honour and Glory


View Profile WWW
« Reply #4 on: June 16, 2018, 10:55:45 AM »

If you've never written a assembler* before, i'd recommend taking a look at a open-source assembler for a simple CPU first ( for example this DCPU-16 assembler ) It should give you a general idea on how to write your own assembler.

*A "assembly compiler" is generally referred to as a "assembler" Wink

Alright thanks
Logged

Daid
Level 1
*



View Profile
« Reply #5 on: June 17, 2018, 12:40:01 AM »

*A "assembly compiler" is generally referred to as a "assembler" Wink
I think this might be hitting more mark then you think.

Reading your 2nd (longer) post. I think you have some terminology to catch up to.
https://www.tutorialspoint.com/compiler_design/compiler_design_overview.htm


I assumed you wanted to make the assembler. But reading your 2nd post, I think you might want to make the whole thing, from high level language to machine instructions.
I've been there. And I can tell you, it's a lot of work. I've only made functional script compilers, my other compiler projects died before doing anything functional. Script compilers generally compile to an intermediate format that needed an engine to run in.


But, basic steps you will need then:
Tokenizer: Converting lines into identifiers, numbers, etc. It's quite an easy step, but it makes the next step a lot easier to build.
Parser: I recommend a "recursive descent parser". As a result, you will get an "Abstract syntax tree" (AST). This step is quite complex already.
Code generator: For each function in your AST, you'll need to walk to tree, and generate the right machine instructions.


Naturally, there is a lot more you could do. Pre-processing, optimizing. But those are all extra steps you can do. Not really required to get a working compiler from A to Z.

Why did I only do scripting? Because the code generation becomes complex very fast with different variable types and register allocation. My script parser just did a pure stack based implementation, which was a lot easier.


Now, other things you can read into:
The python AST module: https://docs.python.org/3/library/ast.html gives full access to the AST of python code. Can give more insight into ASTs themselves.
Python also has it's tokenizer available for experimentation: https://docs.python.org/3/library/tokenize.html
LUA is a stack based scripting language. Understanding how LUA works with it's stack can help in understanding lower level machine based programming, which also includes a stack: https://www.lua.org/pil/contents.html (Especially the C API)


Another thing you could investigate, is just making a new "front" for the GNU compiler, or LLVM.
Logged

Polly
Level 6
*



View Profile
« Reply #6 on: June 17, 2018, 04:45:07 AM »

I assumed you wanted to make the assembler. But reading your 2nd post, I think you might want to make the whole thing, from high level language to machine instructions.

Since he specifically mentions FASM ( which is a x86 assembler ) i actually doubt that.
Logged
Crimsontide
Level 4
****


View Profile
« Reply #7 on: June 17, 2018, 05:25:39 AM »

Ya, but  he also mentions his own language so...  TBH I'm kinda confused at what the end game is as well.
Logged
Daywalker
Level 0
***


Honour and Glory


View Profile WWW
« Reply #8 on: June 17, 2018, 06:50:09 AM »

Ya, but  he also mentions his own language so...  TBH I'm kinda confused at what the end game is as well.

Well, see the assembler is the easy part (AFAIK), I just need a few formulas to convert from "MOV EAX, EBX" (ascii) to (insert x bits here) - binary. That's the part discussed here.

The higher language that I will be making ("Shadow") is a far more complex task, and of course it's the priority, that's why I don't think I will be starting on this project right away (as I'm currently doing a lot of work trying to figure out how to convert my high level language into assembly instructions).


Reading your 2nd (longer) post. I think you have some terminology to catch up to.
https://www.tutorialspoint.com/compiler_design/compiler_design_overview.htm

Well I don't think I will be doing all those steps, for this project posted here, it will just be ASCII (source code)->MACHINE-CODE STRUCTURE (linked lists or whatever/structures/arrays)->PORTABLE EXECUTABLE/DLL (Windows)


I assumed you wanted to make the assembler. But reading your 2nd post, I think you might want to make the whole thing, from high level language to machine instructions.

Yes, but they are not connected. See I could just make the high level language->ASM, then use FASM as a compiler. But I want to create my own compiler also as an add-on. (which is what I hope you guys might be able to help me with in case I run into a few minor issues Smiley)


I've been there. And I can tell you, it's a lot of work. I've only made functional script compilers, my other compiler projects died before doing anything functional. Script compilers generally compile to an intermediate format that needed an engine to run in.

Yea, I need I would say a strong mastery of assembly and machine code instructions. By the way I already made a functional compiler and had no issues with it (my current Shadow language compiler, which first takes source code and interprets it, storing it as a structure in memory, then the builder, which recognizes that structure is able to convert it into ASCII for my target language), in fact it's what I currently use. Though I confess it was extremely difficult to program.

I should point out to you all, as a programmer, I intend to make language-making on the run one of my specialized skills, by the time I'm done with my homework, hopefully I will be able to create language-x in a single day, with many layers of complexity (probably far more than any language currently out there).


But, basic steps you will need then:
Tokenizer: Converting lines into identifiers, numbers, etc. It's quite an easy step, but it makes the next step a lot easier to build.
Parser: I recommend a "recursive descent parser". As a result, you will get an "Abstract syntax tree" (AST). This step is quite complex already.
Code generator: For each function in your AST, you'll need to walk to tree, and generate the right machine instructions.

Sounds good (though I confess I don't guarantee I will use all these things mentioned, as I have my own approach to language/compiler design)


Naturally, there is a lot more you could do. Pre-processing, optimizing. But those are all extra steps you can do. Not really required to get a working compiler from A to Z.

Future


Why did I only do scripting? Because the code generation becomes complex very fast with different variable types and register allocation. My script parser just did a pure stack based implementation, which was a lot easier.

Yeah absolutely, I hope I'm able to tackle all the complexity issues as I work more with languages ^.^'


Now, other things you can read into:
The python AST module: https://docs.python.org/3/library/ast.html gives full access to the AST of python code. Can give more insight into ASTs themselves.
Python also has it's tokenizer available for experimentation: https://docs.python.org/3/library/tokenize.html
LUA is a stack based scripting language. Understanding how LUA works with it's stack can help in understanding lower level machine based programming, which also includes a stack: https://www.lua.org/pil/contents.html (Especially the C API)

Cheers


Another thing you could investigate, is just making a new "front" for the GNU compiler, or LLVM.

Not sure what that means, though I might look into it
« Last Edit: June 17, 2018, 09:19:00 AM by Daywalker » Logged

Daid
Level 1
*



View Profile
« Reply #9 on: June 17, 2018, 09:03:39 AM »

Another thing you could investigate, is just making a new "front" for the GNU compiler, or LLVM.
Not sure what that means, though I might look into it
The "backends" behind gcc and clang (both C compilers) support multiple languages. So it might be possible to add your own language to this as well. But I have no clue how complex that would be.
Logged

Daywalker
Level 0
***


Honour and Glory


View Profile WWW
« Reply #10 on: June 17, 2018, 09:11:39 AM »

The "backends" behind gcc and clang (both C compilers) support multiple languages. So it might be possible to add your own language to this as well. But I have no clue how complex that would be.

Fair enough, if anything I might look into it as well
Logged

Polly
Level 6
*



View Profile
« Reply #11 on: June 17, 2018, 10:19:25 AM »

Ah, if your end goal is a compiler for a new high-level language that changes things a bit Tongue

If you're going to write a compiler ( that doesn't need inline assembly support ) it's common to use a "integrated assembler" .. which basically means that the compiler generates machine code directly instead of generating the intermediate assembly instructions and invoking a assembler. So as mentioned by Daid, you might want to take a look at ( for example ) LLVM instead of writing everything from scratch yourself.

For instance, Jonathan Blow has been working on his own programming language ( Jai ) since 2014 and is using LLVM as backend as well.
Logged
Daywalker
Level 0
***


Honour and Glory


View Profile WWW
« Reply #12 on: June 17, 2018, 11:23:10 AM »

So as mentioned by Daid, you might want to take a look at ( for example ) LLVM instead of writing everything from scratch yourself.

I'll consider it
Logged

Daid
Level 1
*



View Profile
« Reply #13 on: June 18, 2018, 08:24:55 AM »

Only slightly related, but you mention dislike for mainstream OSes as well. https://wiki.osdev.org/Expanded_Main_Page is also a treasure of information. Especially if you like the low level stuff, as you indicate you do.
Logged

Daywalker
Level 0
***


Honour and Glory


View Profile WWW
« Reply #14 on: June 18, 2018, 09:05:52 AM »

Only slightly related, but you mention dislike for mainstream OSes as well. https://wiki.osdev.org/Expanded_Main_Page is also a treasure of information. Especially if you like the low level stuff, as you indicate you do.

It's the long shot, but I do hope to one day make my own OS Smiley

(Well I mean I've already made a very basic text-based one.. but I mean a proper one)
Logged

Daywalker
Level 0
***


Honour and Glory


View Profile WWW
« Reply #15 on: June 18, 2018, 09:32:25 AM »

Lol you were right, there is a lot of complexity in this IA-32.. didn't feel like there was.. but this morning I was listing down all the instructions that I consider to be half-relevant (I will cut down on it, but this is the starting point)

Well.. 2 1/2 hours later (which I didn't notice I was writing for that long lol).. and voila, 12 pages full of instructions... (1-2 lines per instruction)

My gawd
Logged

Daywalker
Level 0
***


Honour and Glory


View Profile WWW
« Reply #16 on: June 19, 2018, 05:42:54 AM »

Alright, so I've decided the first version of my compiler will not include the FPU instruction set (since that instruction set is like.. 3 pages long..)

I see a lot of overlap in a lot of instructions, so it seems there will be a huge cutoff when I actually get down to it .

I will include basic arithmetic, memory control, and (probably) all the jumps on the IA-32 instruction set. I hope to post a full list of instructions that I will be working with some time today Smiley

peace
Logged

Daywalker
Level 0
***


Honour and Glory


View Profile WWW
« Reply #17 on: June 19, 2018, 07:22:20 AM »

Currently these instructions are derived purely from the 8086/8088 instruction set


[Arithmetic]

ADD, MUL, SUB, DIV
IMUL, IDIV (signed)

ADC, NEG

INC, DEC (obsolete-ish but meh)


[Memory Control]

MOV, XCHG, PUSH, POP


[Logical]

AND, OR, XOR, NOT


[Bit Control]

SHL, SHR
SAL, SAR

ROL, ROR
RCL, RCR


[Jumps/Calls]

JMP
CALL
RETN, RETF

JA, JAE, JNA, JNAE
JB, JBE, JNB, JNBE
JZ, JNZ
JC, JNC
JE, JNE

JP, JNP
JPE, JPO


Signed:

JG, JGE, JNG, JNGE
JL, JLE, JNL, JNLE
JS, JNS
JO, JNO


[Flag Control]

PUSHF, POPF
CLC, STC
CLD, STD
CLI, STI


[Tests/Comparisons]

TEST, CMP


[Misc]

LEA


[Not Currently Supported]

SAHF, LAHF
CWD, CBW

CMPSB, CMPSW
SCASB, SCASW

IN, OUT

INT, INTO
IRET

LODSB, LODSW
MOVSB, MOVSW
STOSB, STOSW

REP, REPE, REPNE, REPNZ, REPZ


[No Known Usage]

AAA, AAD, AAM, AAS
DAA, DAS

CMC, ESC
LDS, LES
NOP, LOCK
XLAT, WAIT

SBB


[Obsolete]

HLT, JCXZ
LOOP, LOOPE, LOOPNE, LOOPNZ, LOOPZ

I wish there were spoilers or something to make this post less monstrous.. oh well.. xD

Anyway, I will start trying to figure out how to actually compile (a.k.a... assemble) a few of these, and actually create some valid machine code.

Will let you know how it goes soon as I can Smiley
Logged

Daywalker
Level 0
***


Honour and Glory


View Profile WWW
« Reply #18 on: June 19, 2018, 11:39:59 AM »

Been doing some digging around and thinking, just trying to figure out how to assemble the MOV instruction:



source: https://c9x.me/x86/html/file_module_x86_id_176.html

The /r seems to mean register, +rb probably means register & byte, then rw register & word, and rd register & double word .

/0 probably means whatever number follows of given size (8/16/32 bits).

Just trying to figure out all the technical details
Logged

Crimsontide
Level 4
****


View Profile
« Reply #19 on: June 19, 2018, 11:57:41 AM »

I started work on a small assembler of sorts years ago in C++.  The idea was a little different than yours.  Rather than output to a file, it would output the assembly to a chunk of memory, and wrap it like a standard function object.  The idea being I could dynamically generate code 'on the fly'.

I got it working for a few instructions before I got bored (or busy, I'm not sure I was in university at the time), but I can post a few code snippets to give a general idea of the structure.  This is all old code while I was still learning C++ (so while it did work, and the idea was sound IMHO, its still rather nooby-ish), but maybe it'll give you some ideas.

The class looked like this:

Code:
// ----- FunctorX64 -----
template<class F> class FunctorX64 : public Core::FunctorTemplate<F,boost::function_traits<F>::arity> {

protected:
// register enumerations
enum Reg8 { al, bl, cl, dl, ah, bh, ch, dh, sil, dil, bpl, spl, r8b, r9b, r10b, r11b, r12b, r13b, r14b, r15b };
enum Reg16 { ax, bx, cx, dx, si, di, bp, sp, r8w, r9w, r10w, r11w, r12w, r13w, r14w, r15w };
enum Reg32 { eax, ebx, ecx, edx, esi, edi, ebp, esp, r8d, r9d, r10d, r11d, r12d, r13d, r14d, r15d };
enum Reg64 { rax, rbx, rcx, rdx, rsi, rdi, rbp, rsp, r8, r9, r10, r11, r12, r13, r14, r15 };
enum RegXMM { xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15 };

private:
// register type info (a little tedious this way, but easier than handling function template specializations of enums in a template class)
bool IsReg (Reg8) const;
bool IsReg (Reg16) const;
bool IsReg (Reg32) const;
bool IsReg (Reg64) const;
bool IsReg (RegXMM) const;

bool IsReg8 (Reg8) const;
bool IsReg8 (Reg16) const;
bool IsReg8 (Reg32) const;
bool IsReg8 (Reg64) const;
bool IsReg8 (RegXMM) const;

bool IsReg16 (Reg8) const;
bool IsReg16 (Reg16) const;
bool IsReg16 (Reg32) const;
bool IsReg16 (Reg64) const;
bool IsReg16 (RegXMM) const;

bool IsReg32 (Reg8) const;
bool IsReg32 (Reg16) const;
bool IsReg32 (Reg32) const;
bool IsReg32 (Reg64) const;
bool IsReg32 (RegXMM) const;

bool IsReg64 (Reg8) const;
bool IsReg64 (Reg16) const;
bool IsReg64 (Reg32) const;
bool IsReg64 (Reg64) const;
bool IsReg64 (RegXMM) const;

bool IsRegXMM (Reg8) const;
bool IsRegXMM (Reg16) const;
bool IsRegXMM (Reg32) const;
bool IsRegXMM (Reg64) const;
bool IsRegXMM (RegXMM) const;

// register info
bool IsExtendedRegister (Reg8) const; // returns true for extended registers, any register that requires the Rex.R or Rex.B bits to be set
bool IsExtendedRegister (Reg16) const;
bool IsExtendedRegister (Reg32) const;
bool IsExtendedRegister (Reg64) const;
bool IsExtendedRegister (RegXMM) const;

bool IsRexRegister (Reg8) const; // true for registers which require a rex prefix possibly without the Rex.B or Rex.R bits set (sil, dil, bpl, spl, r8b - r15b)
bool IsRexRegister (Reg16) const; // same as IsExtendedRegister
bool IsRexRegister (Reg32) const; // same as IsExtendedRegister
bool IsRexRegister (Reg64) const; // same as IsExtendedRegister
bool IsRexRegister (RegXMM) const; // same as IsExtendedRegister

bool IsHighRegister (Reg8) const; // returns true for ah, bh, ch, dh
bool IsHighRegister (Reg16) const; // returns false
bool IsHighRegister (Reg32) const; // returns false
bool IsHighRegister (Reg64) const; // returns false
bool IsHighRegister (RegXMM) const; // returns false

bool IsLowRegister (Reg8) const; // returns true for al, bl, cl, dl
bool IsOldRegister (Reg8) const; // returns true for al, bl, cl, dl, ah, bh, ch, dh
bool IsNewRegister (Reg8) const; // returns true for al, bl, cl, dl, sil, dil, bpl, spl, r8b, ..., r15b

// helper functions
byte RegisterValue (Reg8) const;
byte RegisterValue (Reg16) const;
byte RegisterValue (Reg32) const;
byte RegisterValue (Reg64) const;
byte RegisterValue (RegXMM) const;
byte GetScale (int) const;

// prefix's
void OperandSizePrefix (); // changes the default operand size from 32 to 16 bit, must be used before RexPrefix
void AddressSizePrefix (); // changes the default address size from 64 to 32 bit, must be used before RexPrefix
void RexPrefix (bool W, bool R, bool X, bool B); // emits a Rex prefix with the associated WRXB bits set

// output ModRM byte (if D bit in OP is set rm = src, reg = dest, otherwise reg = src, rm = dest), SIB, Disp, as needed
void ModRM (Reg8 reg, Reg8 rm);
void ModRM (Reg16 reg, Reg16 rm);
void ModRM (Reg32 reg, Reg32 rm);
void ModRM (Reg64 reg, Reg64 rm);

// op with reg / reg operands (if D bit in OP is set rm = src, reg = dest, otherwise reg = src, rm = dest), SIB, Disp, as needed
void OpRR (byte op, Reg8 reg, Reg8 rm);
void OpRR (byte op, Reg16 reg, Reg16 rm);
void OpRR (byte op, Reg32 reg, Reg32 rm);
void OpRR (byte op, Reg64 reg, Reg64 rm);

// op with an immediate operand (register is encoded in OP, set rexb if the encoded register needs rex.b set)
void OpI (byte op, uint8, bool rex, bool rexb); // rex needs to be set if sil, dil, bpl, spl are to be used, rex and rexb need to be set for r8b - r15b
void OpI (byte op, uint16, bool rexb);
void OpI (byte op, uint32, bool rexb);
void OpI (byte op, uint64, bool rexb);

// op with reg / memory access
template<class TR> void OpRM (byte op, TR reg, int32 disp); // absolute displacement
template<class TR> void OpRP (byte op, TR reg, int32 disp); // displacement from RIP

template<class TR, class TA> void OpRM (byte op, TR reg, TA base, int32 disp);
template<class TR, class TA> void OpRM (byte op, TR reg, int scale, TA index, int32 disp);
template<class TR, class TA> void OpRM (byte op, TR reg, TA base, int scale, TA index, int32 disp);

protected:
// basic operations
void NOP ();
void RET (); // near (same segment) return
void FARRET (); // far (different segment) return

// move reg to reg
void MOV (Reg8 dest, Reg8 src);
void MOV (Reg16 dest, Reg16 src);
void MOV (Reg32 dest, Reg32 src);
void MOV (Reg64 dest, Reg64 src);

// move immediate to reg
void MOV (Reg8 dest, uint8 i);
void MOV (Reg16 dest, uint16 i);
void MOV (Reg32 dest, uint32 i);
void MOV (Reg64 dest, uint64 i);

// move absolute address (access global / absolute data)
void LOD (Reg8 dest, const void* src);
void LOD (Reg16 dest, const void* src);
void LOD (Reg32 dest, const void* src);
void LOD (Reg64 dest, const void* src);

void SAV (Reg8 src, void* dest);
void SAV (Reg16 src, void* dest);
void SAV (Reg32 src, void* dest);
void SAV (Reg64 src, void* dest);

// load from memory
void LOD (Reg8 dest, Reg32 base, int scale, Reg32 index, int32 disp = 0); // loads from base + scale * index + disp


// save to memory
void SAV (Reg8 src, Reg32 base, int scale, Reg32 index, int32 disp = 0); // save to base + scale * index + disp

// load from indirect address (no displacement)
void LOD (Reg8 dest, Reg32 srcAddr);
void LOD (Reg16 dest, Reg32 srcAddr);
void LOD (Reg32 dest, Reg32 srcAddr);
void LOD (Reg64 dest, Reg32 srcAddr);

void LOD (Reg8 dest, Reg64 srcAddr);
void LOD (Reg16 dest, Reg64 srcAddr);
void LOD (Reg32 dest, Reg64 srcAddr);
void LOD (Reg64 dest, Reg64 srcAddr);

// function prolog/epilog code

// debug

public:
// constructor
FunctorX64 ();
};

The registers were all enums, with a bit of function overloading you can make the assembly both easy to use and easier to code.

Most of the ops follow a few simple templates, for example 'RR' ops (operations that access two registers) might look like:

Code:
// ----- op reg / reg -----
template<class F> void FunctorX64<F>::OpRR (byte op, Reg8 reg, Reg8 rm) {

// attempt to encode without REX
if (IsOldRegister(src) && IsOldRegister(dest)) {
AddOp(op);
ModRM(reg,rm);
return;
}

// encode with REX
if (IsNewRegister(src) && IsNewRegister(dest)) {
RexPrefix(false,IsExtendedRegister(src),false,IsExtendedRegister(dest));
AddOp(op);
ModRM(src,dest);
return;
}

// thrown by using the old high 8-bit registers and the new registers in the same op (bh -> r9b for example)
throw InvalidRegisterCombination("template <typename F> void FunctorX64<F>::AddOpRR (byte op, Reg8 reg, Reg8 rm) - invalid register combination");
}

another examples, operand with an immediate encoded, shows how operator overloading can make things quite clean:
Code:
// --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
// op with an immediate operand
// --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

// ----- op immediate -----
template<class F> void FunctorX64<F>::OpI (byte op, uint8 i, bool rex, bool rexb) {
if (rex | rexb) RexPrefix(false, false, false, rexb);
AddOp(op);
AddImmediate(i);
}

template<class F> void FunctorX64<F>::OpI (byte op, uint16 i, bool rexb) {
OperandSizePrefix(); // encode 16 bit operand
if (rexb) RexPrefix(false, false, false, true); // encode extended register
AddOp(op); // encode op
AddImmediate(i); // encode immediate value
}

template<class F> void FunctorX64<F>::OpI (byte op, uint32 i, bool rexb) {
if (rexb) RexPrefix(false, false, false, true);
AddOp(op);
AddImmediate(i);
}

template<class F> void FunctorX64<F>::OpI (byte op, uint64 i, bool rexb) {
RexPrefix(true, false, false, rexb);
AddOp(op);
AddImmediate(i);
}

An op code might be encoded like:

Code:
template<class F> void FunctorX64<F>::LOD(Reg8 dest, Reg32 srcAddr) { OpRM(0x8a, dest, srcAddr, 0); }

As far as to all the details of how intel assembly works at the hex/machine code level.  Well you're just going to have to download and read the official docs, which you can download straight from intel last I checked.  x86/x64 are very complex instruction sets with a ton of 'gotchas'.

Good luck.
Logged
Pages: [1] 2
Print
Jump to:  

Theme orange-lt created by panic