Saturday, February 7, 2015

Wednesday, December 24, 2014

Areum is now Volund - Choosing a new Project Name

"Areum" is not as Unique as we Hoped

"Areum" is not as Unique as we Hoped
One of the biggest changes to Areum recently has been a new name.
Choosing a name for a game is always a difficult process, and this one took us a couple of weeks. I consider us fortunate that it didn't take longer!
First off, let me explain the reason why we renamed the 2D MMORPG project. I want to make it clear that everyone on the team loved the name. Unfortunately, it came to my attention that "Areum" is the name of a South Korean idol singer. This might not have been too bad, but when I searched both Twitter and Google, the results were flooded with information about her.

Why Change Names?

Why Change Names?
Having your brand obscured by another brand of the same name is never a good thing for a business. For one, it makes it difficult to find information about your brand. If your own pages are as deeply buried as they were for us with the name "Areum", people are likely to just give up looking.
Another issue you run into is that your brand name has very likely already been taken on various social media websites. This was certainly the case for Areum, I am sad to say.
A related "problem" is that people who are looking for the other brand may find yours instead. From a publicity standpoint this might not be so bad: Free advertising, right? But it just increases confusion, and there is no guarantee that people will be interested in both brands.

Choosing a new Name

Choosing a new Name
So, on to choosing a new name. The three of us drew up lists and lists of potential names, circling the ones we liked the best. But this was only part of the process: For every potential name, I had to check if it was already taken by another game. I also had to make sure that it was it would show up in search results.
But that wasn't all: I also had to make sure that the name did not translate into anything offensive in another language. On that note, I also had to make sure that the name did not translate into another brand.
All of these checks needed to be "fuzzy", which just made things more complicated. In other words, I needed to make sure that I checked variations of the name and possible misspellings. I also had to check words which were phonetically similar.
That last one actually saved us at the last minute more than once! There were several occasions where we thought we had found a good name. However, we then discovered that it sounded like another word with an unfortunate meaning.

Settling on the name Volund

Settling on the name Volund
After two weeks of this, we were no closer to finding a new name for Areum then when we started. That's when someone on the team, I forget who, suggested using the name "Volund".
For those of you who are just joining our community, I need to explain some history real quick. IfThen Software created an MMORPG named "Volund Preview 1" and released it four years ago on April 10th, 2010. After releasing Preview 1, we moved on to other projects. The server was eventually taken down due to limited resources.
A lot of worldbuilding has already been done for the Volund universe, so this would open up a lot of new options. This sealed the deal for us: Volund would be the new name for Areum,  the medieval fantasy 2D MMORPG project.
We are all very excited to revisit the Volund universe! There are a lot of cool ideas that we are itching to try out.
Official Blog for the Indie Game Development Company, IfThen Software

Wednesday, September 28, 2011

DDR SDRAM Memory Bank

Because I'm getting into the section on memory (and some parts of yesterday's example made me curious, particularly the RAS-to-CAS delay) I have decided to research RAM again.  I did quite a bit of research in this area about a month ago, so a lot of this is review and should go by relatively quickly.

Data is stored in RAM in an array of bits called a "bank".

A horizontal line of bits make up a "word line" and a vertical line of bits make up a "bit line".

The array is split up into rows and columns.  A row is equivalent to a word line, so that's easy enough.  A column is made up of multiple contiguous bit lines, either 4, 8, or 16 depending on the architecture of the chip.  The number of bit lines which make up a column is the word width of the chip, or just simply the "width". This is usually written with an "x" followed by the width: x4, x8, and x16.

The number of rows depends on the generation of DDR and the "density" of the chip.  The density is the number of bits the chip has total across all banks.  The number of columns depends on the density and width of the chip.  The exact row and column counts for the possible densities and widths are specified by the DDR standard for the generation in question, although my experience is that you have to perform some calculations to find them.  Here is a table I put together of the row and column counts for the first generation of DDR:

Tuesday, September 27, 2011

Memory Access Latency

A predicated instruction is an instruction whose execution depends on the result of a true/false test.  Another way to look at it is a single instruction for code like the following: 

if (a > b) c = 6;

Predicated instructions can help to reduce the number of branches in your code, which may increase how fast your program executes.

On a slight tangent, I also learned what a transistor is: A resistor whose value (resistance) changes.  I still don't know how they are used or why there are so many in a processor, but I've satisfied my curiosity for the moment.  I highly recommend this video on the subject:

You can classify a processor as having either a Brainiac design or a Speed-Demon design based on how much it pushes for ILP.  A Brainiac design throws as much hardware at the problem as possible, sacrificing simplicity and size for more ILP.  A Speed-Demon design relies on the compiler to schedule instructions in a way that extracts as much ILP out of the code as possible.  A Speed-Demon design is relatively simple and small, allowing for higher clock speeds (until the thermal brick wall was hit in the early 2000s) which is how it got its name.

I finally started learning about memory access.  One of the reasons I started researching CPU architecture was to find out why a Load-Hit-Store on a Xenon (XBox360 processor) could cause a stall of up to 80 cycles, and I think I am getting close to an answer.  If I could reiterate an example from Modern Processors - A 90 Minute Guide, lets say you have a 2.0GHz CPU and 400MHz SDRAM.  Lets also say that it takes 1 cycle for the memory address to be sent from the CPU to the memory controller, 1 cycle to get to the DIMM, 5 cycles from the RAS-to-CAS delay (assuming there is a page miss, as is likely with the memory hierarchy we have today), another 5 cycles from CAS, then 1 cycle to send the data to the prefetch buffer, 1 cycle to the memory controller, and finally 1 cycle to the CPU.  In total we have 15 memory-clock cycles (assuming the FSB:DRAM ratio is 1:1) to get data from the main memory.  To convert this into CPU clock cycles, multiply it by the CPU:FSB ratio (CPU multiplier), which in this case is 5.  So 15*5 = 75 CPU clock cycles before the data is received from the main memory.  A diagram is definitely easier for me to understand, so here is a drawing I created to help me understand how this latency was calculated:

VLIW and Researching Interlocks

Trying out a different work schedule, so my blog entries may be a little erratic while I work out some of the kinks.  So anyways, this blog post is for yesterday.

Today I learned about a couple new techniques.  The first one is something called "very long instruction word" (VLIW).  In VLIW, a single instruction is actually composed of multiple smaller instructions which have been packed together by the compiler.  The fetch and decode stages of the pipeline can effectively work with multiple instructions in parallel, but they only have to deal with a single instruction.  The decode stage unpacks the sub-instructions and sends them to the appropriate functional units.  The decode stage does not detect hazards and the pipeline can generally only be stalled on a cache miss, so it is the job of the compiler to insert NOP instructions (no-operation) to prevent hazards.  A processor which makes use of VLIW is said to have an explicitly parallel design.

The other technique is something called an "interlock".  I am still researching the specifics, but an interlock in general is some mechanism which prevents harm to either the operator or the machine (an example would be the mechanism which locks the oven door during a self-clean cycle).  In the case of processors, an interlock can detect a data hazard in the decode stage of the pipeline and stall the previous stages while sending NOPs out of the decode stage until the hazard has been resolved.  I am assuming that an interlock is also used to prevent one stage from passing its output to the next stage in the event that the next stage is stalled (for example, a particular instruction is taking multiple clock cycles to execute and the fetch and decode stages must be stalled).

Wednesday, September 21, 2011

Google Docs Organized

I spent a good chunk of the day organizing my Google Docs page; it was becoming impossible to find anything.  Whenever I came across a resource while researching something, I would tell the Google Viewer to save it to my documents page but that caused things to become quite cluttered.

The remainder of the day was spent reading Pipelining: An Overview at ars technica.  It's mostly been review, so I don't have much to report.  However, I did learn that a non-pipelined processor is also called a "Single-Cycle Processor".  The article series also did a good job of explaining what a stall and pipeline bubble is, including a few interesting graphs.

Tuesday, September 20, 2011

Instruction Throughput Decides All

I finally have an answer to my question!  PleasingFungus from the #tigIRC channel on Esper discussed the question with me for two hours straight today and this is the answer I ended up with:

If a scalar (not superscalar) processor has multiple nonredundant functional units (a single FPU and a single ALU for example) and it issues an instruction to the FPU at a certain clock tick, could it issue another instruction to the ALU at the next clock tick so that both the FPU and the ALU are executing at the same time (lets say that FPU instructions take longer to execute than ALU instructions)?  Or would that make the processor superscalar?
A processor is only superscalar if its maximum instruction throughput is greater than 1.  Because only a single instruction is being issued each clock cycle, the maximum instruction throughput is 1, so the processor is not superscalar.  A processor can have multiple pipelines, but if the maximum instruction throughput is equal to or less than 1 the processor is not considered to be superscalar.

Instruction throughput is a measure of the number of instructions which can be processed by the pipeline (committed) per cycle.  Because all instructions don't always get through the pipeline in the same amount of time, the "average instruction throughput" can change depending on which instructions were executed.  The maximum instruction throughput is a static measurement however, representing the best case scenario.

Even if the instruction pipeline splits into two separate pipelines (one for each functional unit) at the issue stage (or some other point), if only one instruction is dispatched per clock cycle the processor has an instruction throughput less than or equal to 1.  Two instructions might complete at the same time (if one functional unit is slower than the other), thus giving the illusion of having an instruction throughput of 2, but it will take at least one cycle to "recover" from this, during which no instructions will complete; so it evens out.

I am still doing some research into this topic, although I currently consider myself to be about 50% finished with the question.  I plan on reading the Pipelining: An Overview series at ars technica after working through some example pipelines on paper.