My analysis dont show the region from my country

juan_dc · May 12, 2022, 6:07pm

Hello, thank you for this excellent tool.

I want to carry out the analysis of Colombia and the 6 global regions, but at the end of my analysis, Colombia and the regions appear but without South America
any advice? this is the file i use

inputs:
  - name: prueba-3_1
    metadata: data/prueba1/prueba_global.tsv
    sequences: data/prueba1/prueba_global.fasta

builds:
  global_prueba3_1:
    subsampling_scheme: colombia_sampling
    region: South America
    country: Colombia


subsampling:
  colombia_sampling:
    country:
      group_by: "division year month"
      max_sequences: 200
      exclude: "--exclude-where 'country!={country}'"
  
    global:
      group_by: "country year month"
      seq_per_group: 1
      priorities:
        type: "proximity"
        focus: "country"     

traits:
  global_prueba3_1:
    sampling_bias_correction: 5.0
    columns: ["country", "division"]
files:
  auspice_config: "my_profiles/my_runs/test3/my_auspice_config.json"
  description: "my_profiles/my_runs/test3/my_description.md"

Thank you for your time

james · May 12, 2022, 7:41pm

Hi @juan_dc – I think this is the same issue as this recent post: Losing country info in final build and should be fixed by removing the region: South America from your builds declaration. (Your subsampling scheme doesn’t use this value so it should have no detrimental effects.)

juan_dc · May 12, 2022, 8:54pm

Hi, I tried deleting the region line and in the final phase I find this error:

augur traits is using TreeTime version 0.8.6
Assigned discrete traits to 1286 out of 1286 taxa.

NOTE: previous versions (<0.7.0) of this command made a 'short-branch
length assumption. TreeTime now optimizes the overall rate numerically
and thus allows for long branches along which multiple changes
accumulated. This is expected to affect estimates of the overall rate
while leaving the relative rates mostly unchanged.
ERROR: 300 or more distinct discrete states found. TreeTime is currently not set up to handle that many states.
[Thu May 12 15:19:08 2022]
Error in rule traits:
jobid: 32
output: results/global_prueba3_1/traits.json
log: logs/traits_global_prueba3_1.txt (check log file(s) for error message)
shell:

    augur traits             --tree results/global_prueba3_1/tree.nwk             --metadata results/global_prueba3_1/metadata_adjusted.tsv.xz             --output results/global_prueba3_1/traits.json             --columns country division             --confidence             --sampling-bias-correction 5.0 2>&1 | tee logs/traits_global_prueba3_1.txt
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Logfile logs/traits_global_prueba3_1.txt:
augur traits is using TreeTime version 0.8.6
Assigned discrete traits to 1286 out of 1286 taxa.

NOTE: previous versions (<0.7.0) of this command made a 'short-branch
length assumption. TreeTime now optimizes the overall rate numerically
and thus allows for long branches along which multiple changes
accumulated. This is expected to affect estimates of the overall rate
while leaving the relative rates mostly unchanged.
ERROR: 300 or more distinct discrete states found. TreeTime is currently not set up to handle that many states.

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2022-05-12T145042.133201.snakemake.log

the error is fixed if I write region again, but again I lose South America

james · May 12, 2022, 9:20pm

When you specify a region in the build (as above), the metadata is modified such that a sample from outside South America (e.g. “France”) has it’s country changed to the corresponding region (e.g. “Europe”). This has the effect of reducing the set of values for the country key which you are performing a DTA on (via augur traits).

When you remove region: South America you don’t modify the metatada (good) but this results in lots of countries which you are performing DTA on (bad). In this case, I’d recommend removing "country" from the list of traits to run DTA on; I’d also remove division (which will probably have more demes than country). Reconstructing region might be ok.

juan_dc · May 12, 2022, 10:49pm

Hello, I eliminated the country and division line in traits, but when I saw in auspice I had a division of all the countries worldwide

Is there any way that when selecting country (Colombia) in auspice the rest of the world remains as regions (north america, south america…)?

Image result

I don’t know if I’m doing something wrong or if I need to modify the my_auspice_config.json file

thanks for your time

juan_dc · May 12, 2022, 10:50pm

the result I want but I need south America

posted in another message due to the limitation of 1 image

james · May 12, 2022, 11:11pm

Thanks for those images, they’re helpful.

So we do want region: South America in the build declaration, as that will change the metadata of non-South-American counties to be their region (continent) as shown on the second screenshot. Note that this happens after subsampling.

I’m not sure why South American counties are being filtered out, it looks to me like the global part of your subsampling scheme should include these. (The reason Colombian samples are included is due to the country part of the scheme scheme.)

You could add the following to your subsampling scheme, but it’s more of a hack than understanding what’s actually not working as expected:

    region:
      group_by: "division year month"
      max_sequences: 500
      exclude: "--exclude-where 'region!={region}'"

juan_dc · May 13, 2022, 12:53am

the result I got when adding the lines, in other attempts I have obtained similar results but I can’t get only the South American region to be represented

juan_dc · May 13, 2022, 2:54am

In this test I changed the country for Spain and now Europe disappears

lines:

inputs:

name: prueba-2-2
metadata: data/prueba1/prueba_global.tsv
sequences: data/prueba1/prueba_global.fasta

builds:
global_prueba2-2:
subsampling_scheme: spain_sampling
region: Europe
country: Spain

subsampling:
spain_sampling:
country:
group_by: “division year month”
max_sequences: 100
exclude: “–exclude-where ‘country!={country}’”
region:
group_by: “country year month”
max_sequences: 50
exclude: “–exclude-where ‘region!={region}’”
priorities:
type: “proximity”
focus: “country”
global:
group_by: “country year month”
seq_per_group: 10
exclude: “–exclude-where ‘region={region}’”
priorities:
type: “proximity”
focus: “country”

files:
auspice_config: “my_profiles/my_runs/test2-2/my_auspice_config.json”
description: “my_profiles/my_runs/test2-2/my_description.md”

james · May 13, 2022, 5:16am

Looks like you’ve got this working now

Do you want to hide all non-South-American demes from the map?

P.S. While DTA will show plenty of lines, given the subsampling employed they should be treated with extreme caution!

juan_dc · May 13, 2022, 3:04pm

but my point is that in South America you only see Colombia, the rest of the countries (Argentina, Peru, Ecuador…) must be represented in a circle that corresponds to the region of South America, as in the other regions of the world (North America , Europe …)

I hope you can help me, I have tried different configurations and I have not achieved my goal.

In the case of the lines, it is because I am using a test dataset, once I am clear about how to perform my analysis, I will proceed to do it with my real dataset.

The goal I hope

james · May 15, 2022, 10:16pm

the rest of the countries (Argentina, Peru, Ecuador…) must be represented in a circle that corresponds to the region of South America, as in the other regions of the world (North America , Europe …)

This isn’t possible with the current workflow but it shouldn’t be too hard for you to do this.

Option 1. Modify the adjust metadata regions rule to use the country instead of the region, and adjust the underlying script accordingly so that any sample not in Colombia has their metadata changed appropriately.

Option 2. Use a script to change your metadata before running the pipeline so that the country field is as you desire. Note that this will slightly change the subsampling algorithm, as your current scheme groups by “country” (which you will have replaced with the corresponding region).

james · May 16, 2022, 8:03pm

Great! Here are the various ways you can share data through nextstrain.org - for a one-off perhaps the easiest is the community (GitHub) approach.

juan_dc · June 23, 2022, 4:23pm

It took some time but it worked.
Thank you

Topic		Replies	Views
Loss of country information in final auspice output Help and Getting Started	2	375	March 22, 2021
Geography: will augur recognize correct region if the correct country is provided? Help and Getting Started	1	26	July 13, 2024
Losing country info in final build General	4	459	May 17, 2022
AttributeError: ‘Clade’ object has no attribute ‘country_exposure_confidence’ Help and Getting Started	1	551	September 7, 2021
Lat_longs.tsv country/region confusion Help and Getting Started	2	629	September 2, 2021

My analysis dont show the region from my country

Related topics