Skip to content

Processing

When working with MICRESS tabular data using MicPy, understanding how to process the data efficiently is essential. Data processing involves various tasks such as extracting specific subsets of data, selecting individual columns or rows, and accessing specific values within a DataFrame. This section will guide you through the key techniques for processing your MICRESS data to enable detailed analysis and insights.

Data structure

When you read a tabular file with MicPy, the data is returned as a Pandas DataFrame object. A DataFrame is a two-dimensional data structure that organizes data into rows and columns, similar to a table or spreadsheet. Each row typically represents an observation, such as a time step in a MICRESS simulation, while each column represents a variable or feature, such as temperature or phase fractions.

Below is an example of a DataFrame containing the results of a MICRESS Delta-Gamma transformation simulation:

Simulation time [s] Temperature [K] Fraction Phase 0 LIQUID Fraction Phase 1 BCC_A2 Fraction Phase 2 FCC_A1
0 0.0000 1786.00000 1.000000 0.000000 0.000000
1 1.0000 1785.00000 0.998751 0.001249 0.000000
2 2.5000 1783.50000 0.992867 0.007133 0.000000
3 5.0000 1781.00000 0.976858 0.023142 0.000000
4 7.5000 1778.50000 0.959798 0.040202 0.000000
... ... ... ... ... ...

To effectively work with your data, you need to know how to access and manipulate it within the DataFrame. The following sections introduce various methods provided by Pandas for accessing specific subsets of data, selecting individual columns or rows, and retrieving specific values.

Reading the file

To load data from a MICRESS tabular file into a DataFrame, you can use the read() method from the tab module in MicPy. This method parses the file and returns a DataFrame object.

1
2
3
from micpy import tab

df = tab.read("A001_Delta_Gamma.TabF")

Selecting columns

Columns in a DataFrame represent different variables or features. You can select individual columns by specifying the column label within square brackets. For example, to extract the temperature column:

temperature = df["Temperature [K]"]

This returns a Pandas Series object, which is a one-dimensional array representing the selected column.

Temperature [K]
0 1786.00000
1 1785.00000
2 1783.50000
3 1781.00000
4 1778.50000
... ...

To select multiple columns, pass a list of column labels:

phase_fractions = df[["Fraction Phase 0 LIQUID", "Fraction Phase 1 BCC_A2"]]

This returns a DataFrame containing the selected columns:

Fraction Phase 0 LIQUID Fraction Phase 1 BCC_A2
0 1.000000 0.000000
1 0.998751 0.001249
2 0.992867 0.007133
3 0.976858 0.023142
4 0.959798 0.040202
... ... ...

Selecting rows

Rows in a DataFrame often correspond to time steps. You can select rows using the loc[] method by specifying index labels or using iloc[] for integer-based indexing.

To select the first row using loc[]:

first_row_by_label = df.loc[0]

This returns a Series object with the data from the first row:

Simulation time [s] Temperature [K] Fraction Phase 0 LIQUID Fraction Phase 1 BCC_A2 Fraction Phase 2 FCC_A1
0 0.0000 1786.00000 1.000000 0.000000 0.000000

To select the last row using iloc[]:

last_row_by_index = df.iloc[-1]

This command also returns a Series object but contains the data from the last row.

Simulation time [s] Temperature [K] Fraction Phase 0 LIQUID Fraction Phase 1 BCC_A2 Fraction Phase 2 FCC_A1
20 50.0000 1751.45000 0.467904 0.259227 0.272869

To select multiple rows, pass a list of index labels or integer positions:

selected_rows = df.loc[[0, 1, 2]]

This returns a DataFrame with the specified rows.

Simulation time [s] Temperature [K] Fraction Phase 0 LIQUID Fraction Phase 1 BCC_A2 Fraction Phase 2 FCC_A1
0 0.0000 1786.00000 1.000000 0.000000 0.000000
1 1.0000 1785.00000 0.998751 0.001249 0.000000
2 2.5000 1783.50000 0.992867 0.007133 0.000000

Selecting values

To access specific values within a DataFrame, use the at[] method for label-based indexing or iat[] for integer-based indexing.

For instance, to get the temperature value from the first row:

initial_temperature = df.at[0, "Temperature [K]"]

This returns the float value 1786.0.

Filtering rows

Filtering allows you to select rows that meet specific criteria. This is done using boolean indexing. For example, to select rows where the temperature exceeds a certain threshold:

high_temperature_data = df[df["Temperature [K]"] > 1780]

This returns a DataFrame containing only the rows where the temperature is greater than 1780 Kelvin.

Simulation time [s] Temperature [K] Fraction Phase 0 LIQUID Fraction Phase 1 BCC_A2 Fraction Phase 2 FCC_A1
0 0.0000 1786.00000 1.000000 0.000000 0.000000
1 1.0000 1785.00000 0.998751 0.001249 0.000000
2 2.5000 1783.50000 0.992867 0.007133 0.000000
3 5.0000 1781.00000 0.976858 0.023142 0.000000

You can combine multiple conditions using logical operators such as & (AND) and | (OR). For example, to filter rows where the temperature is between 1750 K and 1760 K:

temperature_range_data = df[
    (df["Temperature [K]"] >= 1750) & (df["Temperature [K]"] <= 1760)
]

Sorting rows

Sorting rows allows you to reorder your DataFrame based on the values in one or more columns. To sort rows by simulation time in descending order, use the sort_values() method:

sorted_data = df.sort_values(by="Simulation time [s]", ascending=False)

This returns a DataFrame with rows sorted from the highest to the lowest temperature.

Simulation time [s] Temperature [K] Fraction Phase 0 LIQUID Fraction Phase 1 BCC_A2 Fraction Phase 2 FCC_A1
20 50.0000 1751.45000 0.467904 0.259227 0.272869
19 45.0000 1751.37500 0.464527 0.259414 0.276058
18 40.0000 1751.47500 0.467859 0.260437 0.271704
17 35.0000 1751.42500 0.461359 0.260637 0.278005
... ... ... ... ... ...

For sorting by multiple columns, pass a list of column names:

sorted_data = df.sort_values(
    by=["Simulation time [s]", "Temperature [K]"], ascending=[False, True]
)

This sorts by simulation time in descending order, then by temperature in ascending order for rows with the same simulation time.

Creating columns

You can create new columns in a DataFrame by performing arithmetic operations on existing columns. For example, to convert the temperature from Kelvin to Celsius:

df["Temperature [°C]"] = df["Temperature [K]"] - 273.15

This adds a new column Temperature [°C] in your DataFrame:

Simulation time [s] Temperature [K] Fraction Phase 0 LIQUID Fraction Phase 1 BCC_A2 Fraction Phase 2 FCC_A1 Temperature [°C]
0 0.0000 1786.00000 1.000000 0.000000 0.000000 1512.85000
1 1.0000 1785.00000 0.998751 0.001249 0.000000 1511.85000
2 2.5000 1783.50000 0.992867 0.007133 0.000000 1510.35000
3 5.0000 1781.00000 0.976858 0.023142 0.000000 1507.85000
4 7.5000 1778.50000 0.959798 0.040202 0.000000 1505.35000
... ... ... ... ... ... ...

Grouping data

Grouping data allows you to aggregate information based on specific criteria. For example, to calculate the average number of grain neighbors for each time step of a grain growth simulation:

1
2
3
4
5
from micpy import tab

df = tab.read("T10_01_GrainGrowth_2D.TabGD")

average_neighbors = df.groupby("Simulation time [s]")["Nb. of Neighbours"].mean()
Simulation time [s] Nb. of Neighbours
0.0 6.537500
5.0 6.226415
10.0 6.476190
15.0 6.492063
... ...
300.0 6.000000

df contains the grain growth simulation data, where each row represents a grain at a specific time step. df.groupby("Simulation time [s]") groups the data by simulation time, and ["Nb. of Neighbours"].mean() calculates the average number of neighbors for each time step.

Conclusion

Understanding how to effectively process and manipulate MICRESS tabular data using MicPy and Pandas is crucial for detailed analysis. The techniques discussed here cover the fundamental operations you'll need to perform comprehensive data analyses. By familiarizing yourself with these methods, you'll be better equipped to extract insights and draw meaningful conclusions from your MICRESS simulation results. For more advanced data processing tasks, consider exploring the official Pandas documentation and additional resources available online.